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Abstract 

<N ; 

^ , Let 1Z be the set of all finite graphs G with the Ramsey property 

that every coloring of the edges of G by two colors yields a monochro- 
£N| ■ matic triangle. In this paper we establish a sharp threshold for random 

graphs with this property. Let G(n,p) be the random graph on n ver- 
^ \ tices with edge probability p. We prove that there exists a function 

■ c = c(n) = 0(1) such that for any e > 0, as n tends to infinity, 

S ■ and 

> ! Pr [G(n, (1 + e)c/y/v) € 11 ) -> 1. 

^ ' A crucial tool that is used in the proof and is of independent inter- 

■ est is a generalization of Szemeredi's Regularity Lemma to a certain 
hypergraph setting. 
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1 Introduction 



1.1 Overview 

This paper brings together several important themes of combinatorics: Ram- 
sey properties, threshold phenomena of random graphs, and Szemeredi-type 
regularity 

Ramsey properties guarantee, for an arbitrary partition of a given struc- 
ture, that a highly organized substructure can be found in at least one 
part of the partition. During the last decade of the last century there has 
been extensive study of Ramsey properties of random structures, see e.g. 
[11, 27, 29, 30, 31, 32, 13, 33]. These papers were all concerned with es- 
tablishing a threshold function for various Ramsey-type properties, either 
of random graphs, random hypergraphs or random sets of integers. For a 
binomial random graph G(n,p(n)), for instance, they provide a critical edge 
probability p = p(n) such that the limiting probability that every coloring of 
a random graph G(n,p) contains certain monochromatic structures depends 
on the asymptotic ratio between p and p. 

In all the above papers there is a multiplicative gap left between the 
upper and lower bound on the threshold edge probability p (see Theorem 1.2 
below). It is therefore not surprising that the natural question of whether 
the gap can be closed has been around for some time. In other words, does 
there exist a constant c such that the asymptotic probability that G(n, dp) 
has a Ramsey property is either or 1, depending only on whether c > c~or 
c < c? This question is usually phrased in the specialized jargon as "does 
there exist a sharp threshold?" 

Sharp thresholds have been established for many random graph proper- 
ties, like connectivity, hamiltonicity and perfect matchings. However, such 
precise results for Ramsey properties seemed out of hand until 1999, when a 
general technique for settling these questions was introduced in [8]. Loosely 
speaking, the main theorem in [8] showed that the question of sharpness of 
threshold for a random graph property is determined by whether the property 
is related to local or rather global graph phenomena. 

Two papers exploiting the technique from [8] for coloring questions are 
[1] and [10]. The latter paper (as well as the earlier [31]) states as the next 
natural candidate for attack, the problem of sharpness of the threshold for 
property 1Z consisting of all graphs G such that in every blue-red coloring of 
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the edges of G there exists a monochromatic triangle. However, this problem 
turned out to be more difficult than those in [1] and [10] and required some 
new combinatorial approach. 

In this paper we add the missing tool that enables us to crack this nut. 
It is a regularity lemma for a certain class of hypergraphs, whose edges 
consist of small subgraphs of a fixed underlying sparse graph (see Lemma 
4.13). Our lemma is a generalization of the celebrated Szemeredi Regularity 
Lemma for graphs [35], and follows in the footsteps of the regularity lemma 
for sparse graphs presented in [20] (see also [22] and [16], Section 8.3) and of 
the hypergraph regularity lemma of Frankl and Rodl [12]. 

The proof of our regularity lemma, Lemma 4.13, provides for a consider- 
able portion of the bulk of the proof of the following sharp threshold theorem, 
which is the ultimate result of our paper. 

Theorem 1.1 There exists a function c = c(n) = 0(1) such that for every 

1 ifp > (1 +e)c/^fn 



There is a slightly disappointing aspect of this result: although we prove that 
c(n) is bounded, the natural conjecture is that c{n) converges to a positive 
limit, and this does not follow from our theorem. Unfortunately, an inherent 
property of the technique we use is that it can only supply such existence 
theorems but no new information as to the exact threshold probability. We 
discuss various possible extensions of Theorem 1.1 in Section 7. 



1.2 Ramsey Properties of Random Graphs 

Let us introduce the arrow notation, commonly used in Ramsey theory. For 
two graphs, H and G, and an integer r > 2, we write G — > (H) r if for every 
coloring of the edges of G by r colors there exists a monochromatic copy 
of H. For example, it is well known that K e —>■ (K 3 ) 2 . Let TZ be the set of 
all graphs G such that G — > (-^3)2- 

A basic question studied in Ramsey theory is, given a graph H and an 
integer r > 2, when is G "rich" enough for G — > (H) T 1 Here richness can 
be interpreted either as the number of edges of G, or as the ratio of edges to 
vertices. 
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In modern graph theory, problems of this type are often studied via ran- 
dom graphs. The theory of random graphs addresses questions concerning 
typical graphs, or graphs "on average". The standard model for a random 
graph is G(n,p), a graph on n vertices, where every one of the (") edges of the 
complete graph belongs to G(n,p) independently, with probability p. When 
studying random graphs a natural problem is: Given H, find a threshold 
function p(n) such that G(n,p) — > (H) r with probability tending to 1 when 
pin) S> p and G{n,p) — > (H) r with probability tending to when p{n) <C p. 
(The existence of such a threshold function is guaranteed by a general result 
of Bollobas and Thomason [4].) 

In a series of papers [11, 27, 29, 30, 31], a threshold function p{n) is 
determined for all graphs H. Its culmination, paper [31], establishes p = 
threshold for G(n,p) — > (H) r , regardless of r, for all H which 
are not star forests. Here m (2) (#) = max F c H (\E(F)\ - 1)/(\V{F)\ - 2). An 
analogous result in the case when the vertices (and not the edges) are colored 
is given in [27]. In the edge-coloring setting, the first case to be settled was 
that of triangles (not counting star forests which are rather trivial). 

Theorem 1.2 ([30]) For every integer r > 2 there exist constants c r and 
C r such that 

lim Pv[G(n,p) -> (K 3 ) r \ = { I % P> Cr l^ (2) 

n-oo V ' J [ if p < C r / y/n. V ' 

Remark: In the above theorem, and throughout the paper G(n,p) is usually 
meant to denote G(n,p(n)), hence the limits, and asymptotic notations. As 
we can see, for a range of p, namely for c r / \fn < p < C T j \fn this statement 
is inconclusive. Similarly, in all the papers mentioned above there is a multi- 
plicative gap left between the upper and lower bound on the threshold edge 
probability p{n). 

In a recent paper [10] it was shown that in many cases this gap can be 
closed, using a general technique from [8] for proving sharpness of thresholds. 
The cases treated in [10] cover vertex-coloring when H, the graph defining the 
Ramsey property, belongs to a wide family of graphs including, for example, 
all complete graphs. Also the case of edge-coloring when H is a tree is dealt 
with. In all these instances it is shown that there exists a function p(n) such 
that for every e > 0, lim^oo Pr[G(n,p) — > (H)] — 1, if p > (1 + e)p{n) and 
0, if p < (1 — e)p(n). 
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It is worthwhile pointing out here a difference between this kind of a 
"sharp threshold" statement and the previous Ramsey threshold results. Al- 
though the results from [10] show that the transition from the non-Ramsey 
region of the values of p to the Ramsey region is swift and sharper than what 
was previously proven, these are only an existence results. They give no fur- 
ther information on the critical value of p which was calculated in previous 
works. 

The most basic case not treated in [10] is the case of graphs with a 
monochromatic triangle in every edge coloring, for which a (weak) threshold 
is given by Theorem 1.2. In the present paper we prove a sharp threshold 
theorem for this Ramsey property, with two colors. This is our Theorem 1.1 
stated above. 



1.3 Sharp Thresholds of Increasing Graph Properties 

In this subsection we introduce the notion of a sharp threshold for a graph 
property, as well as a technique for proving sharpness of thresholds. Con- 
sider a property Q of graphs on n vertices. We identify Q with the set of 
graphs having this property, and use the notation G E Q to denote the fact 
that G has property Q. We will restrict ourselves to properties which are 
invariant under a graph automorphism and also distinguish an important 
class of increasing graph properties, i.e. those that are preserved under edge 
addition. 

Definition 1.1 We say that p = p{n) is a threshold function for an increasing 
graph property Q if 



lim Pr[G(n,p) E Q] 



1 if p = o(p) 
if p = o(p). 



Bollobas and Thomason proved in [4] the existence of threshold functions for 
all increasing set properties, and in particular for all graph properties. 

Some properties do not have sharper thresholds in the sense that for all 
p = p(n) which are of the same order as p, we have < lim^oo Pr[G(n,p) E 
Q] < 1. E.g., this is the case of the (increasing) property of containing a copy 
of a given, balanced graph H, the threshold for which has been established 
by Erdos and Renyi [7] at n~ l l plyH \ Here p(H) is the edge to vertex ratio in 
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H. For example, p(Kk) — (k — l)/2, so the thresholds for the appearance in 
G(n,p) of a copy of K 3 , K 5 , and K 6 , respectively, are n -1 , n~ 1 / 2 and n~ 2//5 . 

But there are other properties, like connectivity, which enjoy much sharper 
thresholds. Indeed, it has been proved by Erdos and Renyi in [6] that 



Definition 1.2 We say that an increasing property Q has a sharp threshold 
if there exists a function p = p(n), such that for every e > 0: 



Otherwise we say that Q has a coarse threshold. 

Thus, the property of being connected has a sharp threshold at p — log n/n. 
Our main theorem in this paper, Theorem 1.1, states that the Ramsey prop- 
erty 1Z has a sharp threshold. 

In [8] the first author gives a necessary and sufficient condition for an 
increasing property to have a sharp threshold. This condition will be the 
central tool used in this paper. Roughly stated, it says that a property with 
a sharp threshold is not statistically determined by a simple local reason, 
that is, by the presence or absence of finitely many edges. For example, the 
property of having a triangle as a subgraph is obviously local, and indeed 
has a coarse threshold (both the critical probability and the length of the 
threshold interval equal 0(l/n)), whereas it seems obvious that connectivity 
is a non-local property, even though it is statistically equivalent to the absence 
of isolated vertices. 

For the Ramsey property TZ, the condition in [8] asserts that in order 
to establish the sharpness of its threshold, one has to show that TZ is not 
influenced by the appearance of any fixed subgraph, which is likely to be 
contained in G(n,p), with the range of p = p(n) limited by Theorem 1.2 to 
p = ©(l/i/n). Of course, TZ is extremely influenced by the appearance of 
which, however, is very unlikely to be present in such a G(n,p). 

In this paper we will use a version of the sharpness criterion from [8], 
which follows readily from the original statement but is more suitable for 
applications. Given a graph M and a disjoint set of n vertices, let M* be an 
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ordered copy of M placed uniformly at random on one of the n\/(n— \V(M) | ) ! 
possible locations. 

Theorem 1.3 Let Q be an increasing graph property, with a coarse thresh- 
old. Then there exist real constants 0<c<C,f3>0, a rational p, 
and a sequence p = p{n) satisfying cn~ x l p < p{n) < Cn~ l l p ', such that 
(3 < Pr[G(n,p) G Q] < 1 — f3 for all n. Furthermore, there exist a,£ > 
and a balanced graph M with density p for which the following holds: 
For every graph property Q such that G(n,p) G Q a.a.s. there are infinitely 
many values of n for which there exists a graph G on n vertices for which 
the following holds: 

(%) GeG , 

(it) G^Q , 

(in) Pr[(GUM*) G Q] > 2a, 

(iv) Pr[(G U G(n, &)) G Q] < a . 

What the theorem says, intuitively, is that in the case of a coarse threshold 
one can find two graphs, G and M as follows: G is a fixed graph on n ver- 
tices that is not a random graph but rather a pseudorandom graph, typical 
of G(n,p) (actually a random choice of G G G(n,p) \ Q will work with prob- 
ability close to 1); M is a "magical" balanced graph such that it is often the 
case that adding a random copy of M to G induces the property in question, 
whereas increasing the number of edges in G randomly by a constant propor- 
tion £ does not induce the property. The addition of a copy of M corresponds 
roughly to inducing a local property, in contrast to increasing the number of 
edges which corresponds roughly to increasing the global density of a random 
graph. Therefore the conclusion of the hypotheses of the theorem is that the 
property Q is "statistically local" . 

The typical way in which this theorem is used to prove that a property 
Q has a sharp threshold involves two steps: 

• The first step is usually easy. For a coarse property Q, the theorem 
guarantees the existence of M. A possible explanation to this would be 
that M itself has the property Q. In that case, since Q is a monotone 
increasing property it would no longer seem "magical" that adding 
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a copy of M induces property Q. In other words, if M G Q then 
Assumption (iii) in the theorem is a triviality and does not enable 
one to deduce anything. Therefore, as a start, one has to rule out 
this possibility by showing that a balanced graph with the prescribed 
density can not have the property. 

• The second step is typically more involved. One chooses an appropriate 
property Q that is typical of G(n,p) and then shows that a graph 
G G G with Pr[(G U M*) G Q] > 2a is quite "saturated", i.e. is close 
to having property Q. Therefore adding a random copy of G(n, £p) 
should induce the property Q with probability much larger than a, 
contradicting condition (4) of the theorem. 

In [8], [1], and [10] one can see variations of this scheme used to prove 
that various graph properties have a sharp threshold. Although the basic 
technique is similar, each property presents its own difficulties and requires a 
special approach. The case of property 1Z, handled in this paper, has by far 
been the most difficult and involved; a key technique in our approach turns 
out to be regularization. 

1.4 Regularity 

One of the fundamental tools in asymptotic graph theory is the well-known 
regularity lemma of Szemeredi [35] (see also [34]). Indeed, since its discovery 
in the 70s, this lemma has been instrumental in the study of the structure of 
large graphs. The reader is referred to the excellent survey [24] for a thorough 
introduction to the wide range of applications of this result. 

In essence, the regularity lemma tells us that any large graph may be de- 
composed into a bounded number of quasi-random, induced bipartite graphs. 
Thus, this lemma is a powerful tool for detecting and making transparent the 
random-like behavior of large deterministic graphs. What makes the lemma 
such a powerful tool is that it reveals a quasi-random structure that enables 
one to carry out a deep quantitative analysis. 

The precise formulation of the regularity lemma is somewhat technical 
(see Section 4.1.1 for details). In this short section, we only discuss some 
points in broad terms. 

The quasi-random bipartite graphs that Szemeredi's lemma uses in its 
decomposition are graphs in which the edges are uniformly distributed. The 
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uniformity is measured by the ratio of edges to potential edges (pairs), and 
so this concept becomes trivial for graphs of vanishing density. To manage 
sparse graphs, one may adjust the notion of quasi-randomness by a natural 
rescaling, and it is a routine matter to check that the original proof extends to 
this notion, provided we restrict ourselves to graphs of vanishing density that 
do not contain 'dense patches' (see Section 4.1.2). However, the quasi-random 
structure that the lemma reveals in this case is harder to exploit than in the 
dense case, and one needs to work harder when applying the lemma to such 
'sparse graphs'. Nevertheless, there have been some successful applications 
of the lemma in this context (see [20, 23]). 

The idea of regularity has also been extended to uniform hypergraphs. 
The version that is most relevant to us is the one in Frankl and Rodl [12], 
which makes it possible to decompose triple systems into quasi-random struc- 
tures made up of triples together with an 'underlying' quasi-random tripartite 
graph. In that setting, the density is measured by the ratio of triples to the 
triangles in the underlying graph (see Section 4.2.1). Moreover, the concept 
of quasi-randomness here is strong enough to allow one to prove that these 
quasi-random pieces contain the same number of finite substructures as they 
would had they been truly random pieces (see [12] and Nagle and Rodl [28]). 

In this paper, we shall introduce yet another concept of regularity, which 
expands and melts together the notions of sparse graph regularity and hyper- 
graph regularity. In the usual context of graph regularity, and in some more 
delicate versions of it there is an invisible underlying graph behind the graph 
we look at, and the regularity expresses the distribution of specified edges 
with respect to the edges of the underlying graph. Similarly, a 3-uniform 
hypergraph can be viewed as a distinguished collection of triangles among 
all triangles of an underlying 3-partite graph. Here, we shall be interested 
in investigating the structure of sparse graphs with respect to some other 
fixed family of small subgraphs. Viewing these subgraphs as edges of a hy- 
pergraph, the lemma we prove (Lemma 4.13) may be interpreted as a sparse 
hypergraphs version of the regularity lemma . Our approach is partly based 
on methods from [20, 23] and [12], but it faces a further difficulty: the as- 
sumption of 'no dense patches' in the standard case (see [20, 23]) was an easy 
consequence of properties of random graphs and therefore did not play any 
significant role; the proof of the analogous fact in the setting of this paper, 
however, requires a fairly complex argument (see Section 4.4). 
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1.5 Outline of the Paper 

• In Section 2 we present the skeleton of the proof of the main theorem. 
It is actually a formal proof which is made extremely short and compact 
by relying on lemmas which will be proven in the rest of the paper. An 
overview of the forthcoming proofs and an (oversimplified) illustration 
will also be given there. 

• In Section 3, assuming the hypothesis of Theorem 1.3, we construct a 
family S of special subgraphs of G (called special constellations) and 
show how every triangle-free coloring of G defines a set of edges of G 
that intersects all special constellations (that is, defines a hitting set 
for the family S. 

• In Section 4 we prove a regularity theorem in the spirit of Szemeredi's 
Regularity Lemma that provides a partition of both the vertices and the 
edges of G such that the special constellations are uniformly distributed 
with respect to this partition. 

• In Section 5, based on the regular partition found in the previous sec- 
tion, we define a core which is a central notion of this proof, and show 
some crucial properties of cores. 

• In Section 6 we show various properties of random graphs that are 
needed throughout the paper. It is there where the family Q is defined, 
and an important lemma, Lemma 2.3, is proved. 

• We conclude with open questions and possible extensions. 

• At the end of the paper we include a glossary of symbols and definitions 
as well as a flowchart of constants exhibiting their mutual dependencies. 
We strongly encourage the reader to make use of both when struggling 
through our proof. 

Notation: 

In Sections 4-6 we use the following notation. For < e < 1, and positive 
reals x, y, 

x ~ y denotes that (1 — e)y < x < (1 + e)y. 
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We will often abbreviate it further as follows: if e' is any function of e that 
tends to zero with e, and x ~ y, then we will simply write x « y. 

Let G = (V, E) be a graph, v , it G V and VP C V. We write e G (W) = 
\E(G[W])\ for the number of edges in the subgraph of G induced by W, 
deg G (v, W) = deg(w, W) for the number of neighbors of v in W, deg G (v) = 
deg(-u) = deg(v, V), for the degree of v in G, and codeg(w, u) for the number 
of common neighbors of u and d in G, called the co- degree of v and u. The 
set of neighbors of v in G is denoted by Nq(v) = N(v), while Ng(W) stands 
for the set of vertices outside W, each having at least one neighbor in W (so 
called open neighborhood of W). 

For a family of sets (Ai) i( zj, we call a set T a hitting set if T n Aj 7^ 
for all i G I. We will often use set partitions V — V\ U ... U Vj, where 
I Vi I < ... < |Vf| < |Vi| + 1. Such partitions will be called here equipartitions. 
Finally, all logarithms are natural and will be denoted by log. 

2 Outline of the Proof 

2.1 Main steps 

Recall that 1Z is the graph property that for every blue-red coloring of the 
edges of a graph there exists a monochromatic triangle. Graphs that have 
this property will be called Ramsey graphs. For a non-Ramsey graph, we 
call a coloring that does not have a monochromatic triangle a triangle-free 
coloring. 

We wish to prove that 1Z has a sharp threshold. By Theorem 1.2, there 
exist constants C2 and C2 such that any threshold p for TZ satisfies 



This means that when applying Theorem 1.3 to Property 1Z we may restrict 
ourselves to sequences p = p(n) falling into this range, and consequently to 
balanced graphs M with p(M) = 2 (i.e. average degree 4.) Thus it suffices to 
prove the following result, which is a mere adaptation of Theorem 1.3 to our 
case. (In fact, it is slightly stronger, since the inequality in (iv) is replaced 
in (4) below by convergence to 1.) 
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Theorem 2.1 For all a,£ > 0, all sequences 

c 2 / \fn < p = p(n) < C 2 / \fn 

and all balanced graphs M with p(M) = 2, there exists a graph property 
Q with lim n ^oo Pr[G(n,p) G G] — 1, and an integer n\, such that for all 
G EQ\n with \V(G) \ = n> m, if 

Pr[G U M* G ft] > 2a (3) 

then 

Pr[(GUG(n,£p)) eft] = l-o(l). (4) 

Note that if the assumption G G" ft (Assumption (ii) in Theorem 1.3) were 
false then (4) would be trivial. A finer point is that if M G ft then (3) yields 
no information on G, and hence is useless. Fortunately the following lemma 
rules out this possibility. This is the typical "easy step" mentioned in the 
introduction. 

Lemma 2.2 If M is balanced and p(M) = 2 then M £ ft. 

Proof: It is a well known fact from the theory of random graphs (see [16], 
page 66) that for any balanced graph H with p(H) = p and p = Q{n~ l l p ) 
there exists a constant (3 > such that the probability that H appears in 
G(n,p) is at least (3. Hence, for any b > 0, the probability that M appears as 
a subgraph of G(n, b/ y/n) is bounded away from zero. If M G ft then, by the 
monotonicity of ft, this would mean that Pr [G(n, bj y/n) G ft] is also bounded 
away from zero. But by Theorem 1.2, for all b < c 2 , Pr [G(n, b/y/n) G ft] — > 0. 
Therefore M g ft. 

Alternatively, and without the use of random graphs, Lemma 2.2 follows 
from a result in [25] which shows that any Ramsey graph (for K^) must have 
a subgraph H for which p(H) > 5/2. □ 

The rest of the paper is devoted to proving that (3) implies (4). An 
approach to statements like (4), which has become standard by now, is via 
the so called two round exposure. This technique originated in the seventies, 
in work of Posa, Ajtai, Komlos, Szemeredi, Fernandez de la Vega, Fenner 
and Frieze, devoted to the existence of Hamilton paths and cycles in random 
graphs (see [3], Chapter VIII, for references). In the context of Ramsey 
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properties, it was explored already in [29] and [30] (see also [15], Sections 1.1 
and 8.4). For a graph G = (V, E) and a set of edges F C E, let Base(F) be 
the set of edges in the complete graph on V that form a triangle with two 
edges of F, formally: 

Base(-F) = {uv : uw, wv £ F for some w £ V}. 

We often identify a subset of edges of a graph with the spanning subgraph 
consisting of them. So, in the above, both F and Base(F) can be viewed as 
graphs on the same vertex set V . 

Suppose we want to show that G± U G2 £ Tt with high probability, where 
Gi = G(n,bi/ \/n), i = 1,2, and the two random graphs are independent. 
(This is, in fact, our case with 6 2 = except for two minor points: that 
G\ is not random but pseudorandom, and that b± may depend on n.) 

First generate the edges of Gi, and let them be colored by an adversary. 
Suppose that at least half of them are blue and call the set of blue edges 
Blue. Clearly, if Base(-B/we) contains a triangle A, no proper coloring of 
G\ U A can extend the adversarial coloring. Therefore, it will suffice to show 
that Base(-BZwe) fl G2 contains a triangle. (See Figure 1.) 




G\ 2-colored Gi 



Figure 1: No triangle- free coloring of G\ U A can extend the given one 



To this end we utilize the following lemma, proved in Section 6. Given 
two real numbers < A < 1 and < a < 1/6, we say that a graph G has 
property T(A,a), if for any subgraph F of G with at least \\E(G)\ edges, 
the graph Base(F) contains at least a|y(G)| 3 triangles. 
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Lemma 2.3 For all A > and c > 0, there exists a > 0, such that if 
P > c/\/n then, with probability 1 — o(l), t/je random graph G{n,p) has 
property T(A, a). 

Applying the above lemma with F = Blue, A = 1/2 and c — b\ would 
yield 0(n 3 ) triangles in Base(-Bkte). In the second round, by a standard 
application of a correlation estimate from [15] (see[16], Section 2.2), often 
called Janson's inequality, at least one of these triangles will be included 
in G<i with very high probability If we were allowed to take 62 sufficiently 
large, then we could make the reciprocal of the error probability larger than 
the exponential number of all bi-colorings of G\, proving G\ U G2 G TZ with 
probability 1 — o(l). 

Unfortunately, in our case 62 = depends on a given a priori £ and can 
be much smaller than b\. This major difficulty demonstrates a more general 
problem of establishing a sharp threshold without knowing up front what the 
exact value of the critical constant c should be. 

Hence we must seek a refinement of the above approach, making use of 
the assumption (3). And indeed, through a special regularization of G we will 
be able to construct a family CORE of subgraphs of G such that for every 
triangle-free coloring of G at least one of these subgraphs is monochromatic. 
Moreover, the size of each subgraph K G CORE will be large enough to yield, 
via Lemma 2.3 and Janson's inequality, at least one triangle in Base(A') fl 
G(n,£p) with probability very close to one, but at the same time the size of 
this family, |CORE|, multiplied by the error probability will tend to 0. This is 
the content of the following two lemmas which together imply Theorem 2.1. 
They are preceded by a setup, which we will often be referring to in the 
paper. 

Setup: 

For the rest of the paper, let us fix constants a,£ > 0, a sequence c/y/n < 
p = p{n) < C/y/n, where c = C2 and C = C2, and a graph M G" 1Z as in 
Theorem 2.1 (it is no longer relevant that M is balanced). We will define 
a graph property Q in Definition 6.1 so that, in particular, each G G Q has 
Property T(A, a) with A = A(a, c, C, M) to be specified later (or see the 
Glossary now) and a = a(X, c) determined by Lemma 2.3. 

We do not attempt to compute explicitly the integer n\, promised in The- 
orem 2.1 and appearing in the next lemma. In principle, it is the maximum 
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of all values of n encountered throughout the proof, most notably the no's 
in Theorem 4.10 and Lemma 4.13, as well as of several implicit lower bounds 
on n hidden in our calculations. 

Lemma 2.4 Let G G Q \7Z be a graph with \ V(G)\ — n > n\ for which the 
assumption (3) of Theorem 2.1 holds. Then for every r > there exists a 
family CORE of subgraphs of G such that 

(a) For every triangle-free coloring x of E(G), there exists K G CORE 
which is monochromatic under \- 

(b) |CORE| < exp(rn 3 / 2 ), 

(c) For every K G CORE we have \K\ > X\E{G)\. 

After thorough preparations in Sections 3 and 4, the family CORE will be 
constructed in Section 5. Also there, all three conclusions of the above lemma 
will be proven in the following manner: 

(a) follows from Lemma 3.5 and Lemma 5.1, 

(b) is Lemma 5.2, and 

(c) follows from Lemma 3.5, Lemma 5.3 and Property (P3) of Q. 

A specific value of r, with which we apply Lemma 2.4, is provided by 
another application of Janson's inequality. We say that a subgraph F C G 
survives if F U G(n, £p) has a triangle-free coloring in which F is monochro- 
matic. 

Lemma 2.5 For every K G CORE, the probability that K survives is at 
most exp {— 2r ra 3//2 } ; where 

T ° ~ 2« 3 c 3 + 2£ 5 C 5 ) ' 

Proof: By Lemma 2.4(c) and the fact that G G T(A, a) there are at least 
an 3 triangles in Base(i^). Let us number some an 3 of them by 1,2,..., an 3 
and let Ii be the indicator random variable for the event that the ith triangle 
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is contained in G(n,£p). Clearly, if K survives then Yli^i = 0- But, ^ [16], 
Theorem 2.18(H), 



where the double sum is over all ordered pairs of distinct, edge- intersecting 
triangles in Base(i^). Note that £\ E/j = an 3 (^p) 3 > a£ 3 c 3 n 3 / 2 and that the 
double sum contains at most n 4 summands, each equalling at most (£p) 5 < 
(£C) 5 ?7,~ 5 / 2 . This completes the proof. □ 

Let us now show how Lemmas 2.4 and 2.5 imply Theorem 2.1, and con- 
sequently Theorem 1.1. We refer to the members of CORE by the name 
cores. 

Proof of Theorem 2.1: Lemma 2.4(a) implies that if G U G(n,£p) has a 
proper coloring \ (which, of course, induces a triangle-free coloring of G) 
then there exists a core which survives. But, using Lemma 2.4(b) with r = 
t , Lemma 2.5 and a simple union bound, we deduce that with probability 
1 — o(l) no core survives. Indeed, 



Pr[Any core survives] < expjron 3 ^ 2 } ■ exp{— 2to72 3 / 2 } = o(l). 

T T 

# of cores survival probability 

Hence Pr[G U G(n, £p) e TZ] — 1 — o(l) as required. □ 



2.2 Overview of the Proof Strategy 

Clearly, at such an early point in the paper, the main idea of the proof 
may yet be obscure. Specifically, we have given no hint as to the connection 
between the existence of a magical graph M having the property described in 
assumption (3) of Theorem 2.1 and the existence of a family CORE satisfying 
the three conclusions of Lemma 2.4. In order to shed some light on this 
connection (albeit it will still be a dim light), we give here a short explanation 
of the logic and motivation behind the construction of CORE. The next 
three sections of the paper are devoted to the constructions that underlie the 
following scheme: 
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• The existence of M such that 

Pr[G U M* eTl}>2a 

implies that the set of triangle-free colorings of G is very restricted; 
there are many sets of vertices in G such that planting a copy of M 
on them kills all triangle-free colorings, i.e. no triangle-free coloring 
of G can be extended to G U M when M is placed on one of the 
aforementioned "bad" sets. 

• We will associate every such "bad" set with a set of edges in G, a union 
of stars which we will call a special constellation. We will fix a coloring 
of these special constellations in such a way that every proper coloring 
of G must agree with every such colored constellation on at least one 
edge (see Lemma 3.2). 

• The above can be translated to the language of hypergraphs: we will 
construct a hypergraph whose hyperedges are the edge sets of the spe- 
cial constellations, and it will turn out that every triangle-free coloring 
gives rise to a hitting set of the hyperedges of this hypergraph (see 
Lemma 3.5). 

• We will show that every such hitting set (and hence every triangle-free 
coloring) may be associated with a large monochromatic set called a 
core. CORE is the family of cores (see Lemma 2.4 above). 

• The key to associating every hitting set with a core is in showing that 
our hypergraph has an inherent regular structure that may be revealed 
by a Szemeredi type partition (see Lemma 4.13). 

2.2.1 An Illustration 

To get a better feeling of how regularity helps in creating the family CORE, 
let us consider a simpler analogue that takes place in the well understood 
setting of graphs, in which special constellations will be replaced simply by 
edges. Hopefully this will give some clue as to what we are doing in the 
forthcoming sections. We refer the reader to Section 4 for the notion of an 
e-regular graph and the statement of the Szemeredi Regularity Lemma. 
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Let H = H[n) be a sequence of graphs on n vertices. A cover set in a 
graph is a set of vertices that intersects every edge. In other words, it is a 
hitting set for the family of all edges of the graph. In general, the number of 
cover sets in an n-vertex graph may be exponential in n, and our goal is to 
"capture" all cover sets of if by a smaller family of large subsets of vertices 
which we will call cores. (Elsewhere in the paper the term core is used in our 
larger setting, cf. Lemma 2.4.) We want the following properties to hold for 
cores: 

• Every cover set contains a core, 

• The number of cores is 2°^ , 

• Every core is of size linear in n. 

In general, this is not an easy task, as the last two properties seem to con- 
tradict each other. 

Before describing how to construct the cores, here is a hint as to why one 
would want the last two conditions to hold simultaneously: in our general 
setting we wish to capture the large family of all triangle-free colorings of 
our graph G by a smaller family of partial colorings, hence a small family of 
cores. On the other hand, we wish every partial coloring to be sufficiently 
large to ensure that the probability of being able to extend it to a larger 
graph G U G(n,£p) is very small, hence cores should be large. 

Returning to the graph H, suppose that if is a bipartite e-regular graph 
on vertex sets V\,V<i, with \V\\ = | V2 j = n and density d(Vi, V2) = d > 0. 
Recall that this means that for every W\ C V\ and W2 C V% such that 
I Wi| > en and | W2I > en, the density of the subgraph between W\ and W 2 , 
is "e-close" to d, and consequently, there is at least one edge between such 
Wi and W2. Hence any cover set must necessarily include at least (1 — e)n 
vertices from either V\ or Vi. Then our family of cores may be formed by 
all sets of size [(1 — e)n\ that are subsets of either V\ or Vi- It is easy to 
see that every cover set contains a core, and the number of cores is 
which indeed is 2°( n ', for e = o(l). 

It is not hard to generalize this construction to the case of a multi-partite 
graph on vertex sets V\, . . . , 14 such that for most pairs Vi, Vj the spanned 
bipartite graph is dense and e-regular. Now comes the use of the Szemeredi 
Regularity Lemma: since every sufficiently large and dense graph is very 
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close to being of this type, the original problem can essentially be reduced 
to this case. 

After seeing this example, hopefully, the reader may have a feeling as to 
why we will eventually devote much energy to exposing the hidden regular 
structure in the hypergraph which expresses the restrictions on the triangle- 
free colorings of G. 

3 Tepees and Constellations 

Assumption (3) of Theorem 2.1 should imply that G is close to being Ramsey 
in the sense that its triangle-free colorings are quite restricted. In this section 
we set up a family of subgraphs of G called special constellations that help 
to capture these restrictions. We will show that every triangle-free coloring 
of G may be associated with a hitting set of this family. 

3.1 Tepees 

Our analysis of the restrictions imposed by (3) on the colorings of G will lead 
us rather naturally to the structures we shall call tepees. 

Assume that M, an arbitrary balanced graph with p(M) = 2, has v 
vertices, and thus 1v edges, and fix a generic copy of M with vertices labeled 
by Xi, . . . ,x u . We begin by defining a copy of M "planted" on an ordered 
subset of vertices of G. For every sequence X = (v\,...,v v ) of distinct 
vertices of G, let M x be the copy of M with vertex x a mapped onto v a 
for each a = 1, . . . , i/. We will often identify a sequence X with the set 
{vi, . . . , v v } of vertices making up X. 

The family of all sequences X satisfying G U Mx £ TZ will be denoted 
by X. Note that assumption (3) of Theorem 2.1 implies that 

\X\ > 2an(n - 1) ■ ■ • (n - v + 1) = (2 - o(\))an v . 

Let X\ C X be the family of all X £ X such that 

(i) X is an independent set in G, i.e. the vertices in X span no edges 
of G, and 

(ii) every vertex of G has at most two neighbors in X. 
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Since G G Q (see Definition 6.1, parts (PI) and (P2)), it follows that almost 
all X's have properties (i) and (ii) above, and so we still have 

|*i | > (2-o(l))an u . 

Because of (i), the property GUMx G TZ indicates that there should be some 
triangles in G U Mx with one edge in Mx and two edges in G (seethe proof 
of Lemma 3.2 below). We now give a name to such structures. 

Definition 3.1 For a pair of vertices {u, v} of G a tepee over {u, v} is a pair 
of edges of G of the form {uw, wv}. Then vertex w is called it the tip of the 
tepee {u, v}. 

In other words, a tepee over {u, v} is any path of length two in G with 
endpoints in u and v. (Clearly, the vertices of a tepee over {u, v} form a 
triangle in G + to.) We will denote such a tepee by uwv for short. For a 
sequence of vertices X G X%, let T(X) denote the set of all tepees in G over 
those pairs of vertices from X which are edges of Mx- By properties (i) and 
(ii) above, all tepees in T(X) are pairwise edge-disjoint and have distinct 
tips. 

Furthermore, let X be the graph formed by the edges of tepees in T(X). 
The vertex set of X consists of X and he set of tips of all tepees in T(X). 
(This notation is supposed to be suggestive of the tepees formed over X.) 
Set t(X) = \T(X)\. Given t(X) = 0, < <\> < n - u, the graph X has v + <\> 
vertices and 20 edges, and is isomorphic to a graph which can be obtained 
from Mx by replacing some of its edges by multiple edges, erasing others, 
and finally replacing all edges of the obtained multigraph by internally 
disjoint paths of length two. So, there are many isomorphism types of X 
possible. The next lemma states that a positive fraction of X's have the same 
isomorphism type of the graph X, and, moreover, the number of vertices in 
X is bounded. (We have sacrificed a% for the sake of global harmony.) 

Lemma 3.1 Let q = \10C 2 u/a] and «2 = n u + q )q ■ There exists an integer 
4> < q, a graph M on v + <j) vertices, and a family X 2 C X\ of size \ X 2 \= a 2 n u 
such that for all X G X 2 , the graph X is isomorphic to M. 

Proof: We will first show that there are at most sequences X G X\ 

such that t(X) > q. For every pair of vertices {it, v} C V, let T(u, v) be the 
set of tepees over {u, v }. 
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For a vertex w G V of degree deg(w) there are exactly ( deg 2 (u,) ) tepees of 
the form uwv. This and the fact that G G Q and, in particular, that the 
degrees of all vertices in G are bounded from above by 2C^/n (see Definition 
6.1 (P3)), yields 

J2( ie f ] )<n( 2C f)<2CV. 

w \ / \ / 

To apply a simple counting argument note that for all {u,v} C V 

\{X : uv G M x }\ < Aun u ~ 2 . 
Consequently, by reversing the order of summation, 

E^ X )=E E \TM\<Avn»- 2 Y,\ T M\<%C 2 vn\ 

This immediately implies that t(X) > q holds for at most ^an u sets X G X\, 
and, therefore, at least 

, , 4 

\X\\ an" > an u 

5 

sets X G Xi have < g. 

Given that t(X) < q, the number of isomorphism types possible for the 
graph X is no more than the number of ordered partitions of the integer q into 
2u+l nonnegative parts, corresponding to the decisions of how many tepees 
span over each edge of Mx- (The (2u + l)-st part is the difference between 
q and the actual number of tepees present.) There are ( 2l/ ^ 9 ) < (2v + q) q 

such partitions. Take as M the most common isomorphism type among 
{X : X G Xi,t(X) < q] and set = \V(M)\ - v and 

X2 = {X G X\ : X is isomorphic to M}. 

Then 

ce 

\X 2 \> t ^n u = a 2 n u . 

□ 

Let us summarize the properties of the family X 2 . For every X G X 2 
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• G U M x £ 1Z, 



• X is an independent set, 

• No three vertices in X share a common neighbor in G, and hence T(X) 
is composed of edge-disjoint tepees with disinct tips, 

• X is isomorphic to M, a graph with v + <j) vertices and 2(f) edges, for 
some (j) < q. 

By Lemma 2.2, M has a triangle-free coloring, that is, a blue-red coloring 
of its edges with no monochromatic triangle. Fix one such coloring and call 
it a'. For each X £ X 2 , color Mx, the copy of M planted on X, by a', and 
denote the so colored copy by M' x . This partitions the tepees in T(X) into 
two sets, RT(X) and BT(X), as follows: RT(X) is the set of tepees in G 
over the red edges of M' x and BT(X) is the set of tepees over the blue edges 
of M' x . 

We broaden the definition of property TZ to partially colored graphs by 
saying that a partially colored graph F belongs to TZ if there is no triangle- 
free coloring of all edges of F consistent with the partial coloring. Clearly, if 
F £ TZ then F with any partial coloring of F belongs to TZ too. Hence, by 
the first property of X 2 listed above, for all X £ X 2 

G U M' x £ TZ. (5) 

We now make a simple but crucial observation which captures in terms of 
the tepees the restrictions on proper colorings of G imposed by (3). 

Lemma 3.2 For every triangle-free coloring \ of E(G) and every X £ X 2 , 
there is either a tepee in RT(X) colored red or a tepee in BT(X) colored 
blue. 

Proof: Let X £ X 2 . By (5), for every triangle-free coloring x of E(G) there 
is a monochromatic triangle in G U M' x . By the definition of a', we know 
that M' x itself does not contain such a triangle. On the other hand, by the 
definition of X 2 , the vertices in X do not span any edges of G. Therefore, we 
conclude that the monochromatic triangle contains exactly one edge of M x . 
The other two edges form a tepee guaranteed by the lemma. □ 
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We would now like to pre-color a subset of edges of G according to whether 
they belong to the tepees from RT(X) or BT(X) for some X G X 2 . Based 
on the above lemma, we could then conclude that every proper coloring of 
E(G) agrees with the pre-coloring on a set of edges which intersects every 
subgraph X on at least two edges, namely, the two edges making up a tepee. 

The problem with this approach is that we cannot assign a color to an 
edge e in G once and for all, because e may belong to a tepee in RT(Xi) 
and at the same time to a tepee in BT(X 2 ) for some X\ 7^ X 2 . To remedy 
this obstacle we now move to a more restricted family X 3 C X 2 for which no 
such clashes occur. 

Recall that V(M) = {x\, . . . ,x u }. Let us label the remaining vertices 
of the prototype graph M by u±, . . . , u$. For each X = (v\, . . . , v v ) G X 2 
choose an isomorphism / between M and X which maps x a onto v a for all 
a = 1, . . . , v, and then set Wj, = f{ub) for all b = 1, . . . , 0. 

Assuming that n is divisible by v + 0, let ix = {Vi, . . . , V u , Wi, . . . , W<f>} 
be a partition of V = V(G) into v + parts of equal size m = njiv + 0). (In 
reality, we put aside an arbitrary set of less than v + vertices, which will 
have only a negligible effect on the estimates in our proofs.) 

We will call a subgraph X consistent with it if, under the above notational 
convention, x a G V a for 1 < a < u, and Wb G Wb for 1 < b < 0. (See, e.g. 
Figure 2.) Our next lemma establishes the existence of a partition with 
respect to which a positive fraction of subgraphs X will be consistent. The 
attached degree constraint will be utilized only in Sections 4 and 5. 

Lemma 3.3 There exists a partition n as above and a family X 3 C X 2 with 
I ^3 1 = OL^n v , where a 3 = a 2 /2{y + q) ( - u+q \ such that for every X G X 3 , the 
subgraph X is consistent with it. Moreover, for every vertex v G V\ U • • • U V v 
and for each b = 1, . . . , 0, we have ^mp < degc(v, Wb) < \mp, where m = 
n/(u + 0). 

Proof: Choose an ordered partition of the vertices of V into v + parts, 
uniformly at random from all such partitions. For a given subgraph X the 
probability that it is consistent with the chosen partition is precisely 

™ > ( u + 

n(n - 1) . . . {n - v - 0+ 1) ~ v ; 



25 



Therefore the expected number of subgraphs X with X G X 2 that are not 
consistent with the random partition is at most 

| X 2 1(1 - + 0) ->+*)) < (a 2 - 2a 3 K- 

By Markov's inequality, with probability at least 013/(012 — a 3 ) at least ot 3 n v 
such subgraphs are consistent with the random partition. On the other 
hand, Chernoff's inequality for hypergeometric distributions (see, e.g., [16], 
Theorem 2.10) and Property (P3) yield that the degree constraint is satisfied 
with probability 1 — o(l). Hence, the existence of a required partition follows. 
□ 

Let 7r = {Vi, . . . , V u , Wi, . . . , W^} be a partition guaranteed by Lemma 
3.3 and let X 3 C X 2 be the corresponding family. We have now finished 
refining the initial family X. The new subfamily X 3 is the final one we will 
work with. Also, the partition ttq will now be fixed for the rest of the paper 
providing a starting point for our further constructions. 

3.2 Constellations 

Note that for all distinct X u X 2 e X 3 we have RT(X 1 )nBT(X 2 ) = 0. In view 
of this and of Lemma 3.2, we are now in a position to assign a pre-coloring 
to the edges of UxeAr 3 ^ m suc h a wa Y that every triangle-free coloring of G 
agrees with the pre-coloring on at least two edges (which form a tepee) of X, 
for every X G X 3 . However, for our purposes it is sufficient to concentrate 
on such an agreement even on one edge of each X. Because of this excess, 
we now simplify the structure of subgraphs we will be dealing with through 
the rest of the paper. 

For every X G A3, instead of X we will consider a certain subgraph S(X) 
of X, called a special constellation. (See Figure 3.) We will shortly define 
this new notion formally, but for now we note that S(X) is a star forest 
(a disjoint union of stars) obtained by erasing one edge of every tepee in 
T(X). The reason we now shift from tepees to constellations is that we will 
need various estimates on their number, and this boils down to counting star 
forests, a task much simpler than counting copies of M which may contain 
many cycles. 

Let X = (v 1, . . . , Vy) G X 3 and v a v a i be an edge in M x with a < a'. For 
a tepee v a wv a > over {v a , v' a } in G we will call v a w the left leg of the tepee 
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Figure 2: subgraphs X consistent with tt q 



and v a /w the right leg of the tepee. Once we can distinguish between the two 
edges of every tepee in T(X), it is easy to define S(X). 

Definition 3.2 For each X G X 3 , let S(X) be the spanning subgraph of X 
whose edges are the left legs of all tepees in T(X). The subgraphs S(X), 
X G will be called special constellations. 

Since for all X e X 3 the subgraph X is isomorphic to M, it follows that for 
all X G X 3 , constellations S(X) have the same isomorphism type. 

Definition 3.3 Let S denote the common isomorphism type of all special 
constellations S(X). That is, S is a subgraph of M with V(S) = V(M), 
which is a union of v vertex disjoint stars, each centered at one of the vertices 
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of M. Their degrees will be denoted by </>i, ... ,</>,,, where = ^24> a , and 
<p u = (because x v is never adjacent to the left leg of a tepee containing it). 

It will now be convenient to relabel the vertices of S and denote the neighbors 
of x a in S by u a b, where b = 1, . . . , <fi a . Accordingly, we modify the convention 
from the previous subsection; namely, we label the vertices of any special 
constellation S(X) by v a and w a b in a such a way that there is an isomorphism 
between S and S(X) which maps x a onto v a and onto w a b for all a = 
1, . . . , v and b = 1, . . . , <p a . Finally, also to conform with this new notation, 
let us relabel the sets of the partition tt : 

7T = {Vl,..., V v , W n , W 1<jtl , • • • , W„-l,l, • • • , W^i,^}, 

so that for every X £ X 3 , the special constellation S(X) satisfies v a £ and 
«j a & £ W a b for all a = 1, . . . , v — 1 and 6 = 1, . . . , O . 

Definition 3.4 We say that a copy So of S is consistent with 7To if there is 
an isomorphism / : S — * S for which /(x ) £ V a and f(u a b) £ W a & for all 
a = 1, . . . , v and b = 1, . . . , <f) a . 

Clearly, all special constellations S(X) are consistent with ttq, but there may 
be many other copies of S in G which are consistent with 7r too. We will 
simply call them constellations. 

Definition 3.5 A constellation is any copy So of S in G which is consistent 
with the partition ttq. A special constellation is a copy So of S in G for which 
there exists X £ X3 such that S(X) = Sq. The set of all constellations in G 
will be denoted by C, and the set of all special constellations will be denoted 
by S. 

Note that the above definition of special constellations is consistent with the 
previously stated Definition 3.2. 

Although we claim that removing the right leg of each tepee will make 
our lives easier, we will need to recall them several times in the paper. This 
will be done via the following "Missing Leg Property" . To state it formally, 
we introduce another piece of handy notation related to the indexing of the 
vertices of every S(X), or equivalently, of S. Define a\ab) to be the unique 
index a' such that x a u a bX a > is a tepee in M. This is well defined because the 
tepees composing M are pairwise edge disjoint. 
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Figure 3: Tepees over X, with and without right legs - creation of a special 
constellation S 

Observation 3.4 ("The Missing Leg Property") Let X G A3. 

(a) For every edge v a w a b of S(X) the pair w a bV a >( a b) is an edge of G ("the 
missing leg"). 

(b) v a and f a '(aft) are the only vertices in X which are neighbors of w a b in 
G (only one missing leg per edge in S(X)). 

(c) The edges of S(X) together with the missing legs form all the tepees in 
T(X). In other words there are no other tepees over the edges of Mj. 

So, the structural property that distinguishes special constellations from 
the ordinary ones is the "Missing Leg Property" that they inherit as offsprings 
of the X's. The most important inheritance, however, is the color- related 
property to be formulated in Lemma 3.5 below. 
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But first let us define a partial coloring a of E(G) which will play a crucial 
role in our proof. It will assign colors only to the edges of {J X £X 3 ^O^O- To 
begin with, recall that a' is a fixed, triangle-free coloring of E(M), and 
define an auxiliary coloring of the pairs (a, b), a = 1, . . . , v, b — 1, . . . , </> a , by 
a(a,b) = a'(x a x a , (ab) ). 

Now, for every edge vw G Uxg^ 3 with v G V a and w G W a b, let 

a(vw) = cr(a,b). Note that for any X G X 3 and any edge e G S'(X), cr(e) is 
red or blue according to whether e belongs to a tepee in RT(X) or BT(X). 
Note also that the color cr(e) is determined by the partition classes of ttq to 
which the endpoints of e belong. 

The partial coloring a is a lens through which we can study complete 
colorings of G. For a coloring x of the edges of G, let 

Agree( X , a) = j e G |J : X (e) = a(e) I . (6) 

Now comes a crucial observation that justifies the whole construction erected 
in this section. 

Lemma 3.5 Let x be a triangle-free coloring of the edges ofG. Then Agree(x, c) 
is a hitting set of S, the family of special constellations. 

Proof: We must show that for every X G X 3 there exists an edge e G S(X) 
such that er(e) = xi e )- Lemma 3.2 states that for every X G X3 and for every 
triangle-free coloring x, there exists either a tepee in RT(X) whose edges are 
colored red by x> or a tepee in BT(X) whose edges are colored blue. The 
left leg of such a tepee is an edge of S(X) for which a and x agree, by the 
definition of a. 

□ 

We may summarize this section as follows: we have shown that the triangle- 
free colorings of G are somewhat structured, in the sense that any such 
coloring must "contain" a hitting set of the family of special constallations 
(see Lemma 3.5). As it turns out, this structure imposed on triangle-free 
colorings is quite restrictive; in Section 5, we will capture this restriction in 
a convenient way. (We are not able to say at this point what we mean by 
this precisely; let us simply appeal to an analogy and refer the reader to 
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the "cores" from Section 2.2.1, which capture a strong structural property 
of cover sets of e-regular bipartite graphs.) In what follows, it will be conve- 
nient to consider the hypergraph whose vertices are the edges of G and the 
hyperedges are the edge sets of the special constallations. Lemma 3.5 tells us 
that triangle-free colorings of G give rise to hitting sets of this hypergraph (or 
cover sets in the terminology of Section 2.2.1). In Section 4, we will discuss 
how to "regularize" this hypergraph; from this regularization we will be able 
to deduce that the structural property imposed on triangle-free colorings of 
G is indeed quite strong (again, recall the discussion in Section 2.2.1). 



4 Regularity 

In this section we state and prove a regularity theorem that is the central 
technical tool in this paper. It is a generalization of the celebrated Szemeredi 
Regularity Lemma to a setting in which the density is measured not merely 
by the number of edges but with respect to the number of copies of certain 
subgraphs. 



4.1 Classical Regularity 

In this subsection we introduce all the necessary results concerning the back- 
ground on standard graph regularity. See Section 1.4 for more information 
on regularity. 



4.1.1 Dense Regular Graphs 

In the classical regularity lemma of Szemeredi [35] a crucial parameter is the 
density of a bipartite graph defined as a ratio of the number of edges of the 
graph to the number of all potential edges. 

In the following, B stands for a bipartite graph with bipartition (U, V) . 

Definition 4.1 Let U' C U and V C V. The density of the pair (U', V) is 
defined as 
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where esiU V) is the number of edges of B with one endpoint in U' and 
the other in V (such edges are said to belong to the pair (V, V')). We will 
sometimes write d(B) for d(U, V). 

The 5-regularity of B reflects low discrepancy between the densities of 
large, induced subgraphs of B. 

Definition 4.2 Let e > 0. A bipartite graph B is called e-regular, if for 
each choice of U' C U and V C V, with \U'\ > e\U\ and \V'\ > e\V\, we 
have 

\d(U',V) -d{U,V)\ < e. 
Sometimes the pair (U, V) itself is called e-regular. 

The following result from [5] and [2] gives a sufficient criterion for regu- 
larity in terms of vertex degrees and co-degrees only 

Lemma 4.1 Let e > and d(U,V) = d > 2e. Assume further that \U\ > 
2/e and denote by W the set of all pairs {v,v'} of vertices of U for which 

(i) deg(f ), deg(f') > (d — e)\V\, and 

(ii) codeg(i>, ?/) < (d + e) 2 \V\. 

If \W\ > |(1 — 5e)\U\ 2 , then B is e' -regular, where e' = [IQe) 1 ^ . 
The next, simple fact is a direct consequence of Definition 4.2. 

Observation 4.2 Let e' > e > 0. If B is e -regular, and V C V , with 
\V'\ = e'\V\, then the pair (U, V) is e" -regular, where e" = e/e'. 

One of the basic applications of e-regularity is for counting copies of a 
given graph in a structure composed of several £-regular and dense pairs of 
vertex sets. We do not give references, as the result seems to belong to the 
"local folklore" . 

Proposition 4.3 For every e > 0, < d < 1 and an integer k such that 
e < d k the following is true. Let Vi, i = 1, . . . , k, be disjoint vertex sets with 
min|Vi| > uq = {2/d) k and let Bij, 1 < i < j ' < k, be e-regular graphs with 
bipartitions (Vi, Vj) and densities d(Bij) = dij, where mindij > d. Then the 
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number #(Kk, [J { . B^) of complete graphs contained in the union [J i j Bij 
satisfies, with e' = ed~ k , 

k 

#(^,ij J B iJ )~ni^in^' 

i,j i=l i,j 

If a less precise estimate of the copies of K\. is needed, then the e-regularity 
can be replaced with a less restrictive notion. We say that a graph G is 
(M)-dense if for all U C V(G) with \U\ > q\V(G)\, we have e G (U) > d( lu 2 l ). 
When verifying this property it suffices to check only all subsets U of size 
\U\ — [q\V(G)\\. The next result comes from [31]. 

Proposition 4.4 For all d > and k there exist g, uq, cq > such that for 
every (g,d)-dense graph G with n = \V(G)\ > n we have #(Kk,G) > c n k . 

Although we do not need it in our proofs, we now state the celebrated 
Szemeredi Regularity Lemma. This in not exactly the original version, but 
one which can be compared with the sparse version from the next subsection. 
Recall the definition of equipartition given at the end of Section 1. 

Theorem 4.5 (Szemeredi Regularity Lemma) For all e > and inte- 
gers to and r, there exist T Q and no such that if G\, . . . ,G r are graphs with 
a common vertex set V, \V\ = n > no, then every equipartition ofV into to 
parts has a refined equipartition V = W\ U . . . U Wt, where to < t < To, and 
all but at most en 2 edges of G\ U . . . U G r belong to the pairs {Wi, Wj} that 
are e-regular with respect to all graphs Gi, i = 1, . . . , r. 

4.1.2 Sparse Regular Graphs 

The graph G, one of the main constituents of the proof of Theorem 2.1, is 
relatively sparse, with density of order 1 / y/n. Below we present a version of 
the regularity lemma designed for sequences of graphs with n vertices and 
density p = o(l). 

As before, let B be a bipartite graph with bipartition (U,V), and let 
<p < 1. 



33 



Definition 4.3 Let U' C U and V C V. The p-density of the pair (W, V) 
is defined as 

dp[U ,V ' p\U'\\V'\ ' 
We will sometimes write d p (B) for d p (U, V). 

As in the dense case, the (s, p)-regularity of B (as defined below) reflects low 
discrepancy between the p-densities of large, induced subgraphs of B. 

Definition 4.4 Let e > and < p < 1. A bipartite graph B is (£,£>)- 
regular if for each choice of U' Q U and V' C I/, with \U'\ > e|{7| and 
\V'\ > e\V\, we have 

\d p (U',V')-d p (U,V)\ <e. 
Sometimes the pair (U, V) itself is called (e,p)-regular. 

There is an analog of Observation 4.2 in the sparse case. 

Observation 4.6 Let e' > e > and < p < 1. If B is (e,p) -regular, 
and V C V, |V'| = £'|V|, f/ien t/ie pair (C/, V) is (e" ,p) -regular, where 
e" = e/e'. 

A basic feature of e-regular graphs is that most of the vertex degrees are 
under control. Here is a useful fact exploited in Section 6. 

Proposition 4.7 If (U, V) is (e,p) -regular and W C V satisfies \ W\ > e\V\, 
then at least (1 — e)\U\ vertices of U have each at least (d p (U,V) — e)p\W\ 
neighbors in W . 

The above notions of p-density and (e, p)-regularity extend naturally to 
pairs of disjoint subsets of vertices of non-bipartite graphs, for which we now 
introduce one more notion, specific just to the sparse case. 

Definition 4.5 For D > and < p < 1 we say that a graph F on n 
vertices has no (D,p)-dense patches, if for every two disjoint sets of vertices 
Wi, W 2 C V(F) with \Wi\, \W 2 \ > n/logn, we have 

d P (Wi,W 2 ) < D. 
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This property, called (p; D, 1/ logn)-boundedness in [16], page 215, guaran- 
tees that the so called partition index is bounded from above, and thus allows 
to repeat the proof of the Szemeredi Regularity Lemma mutatis mutandis. 

But there is another profit from this assumption. In general, there is 
no simple analog of Proposition 4.3 in the sparse case. However, there are 
several places in this paper where we need estimates of the number of stars 
consistent with a certain family of disjoint subsets of vertices. Fortunately, 
in this simple similar counting result is true, provided some kind of 

boundedness is assumed. We prove it here in a specific form suitable for our 
purposes. 

Proposition 4.8 For every e > 0, D > 0, and integer k > 2 such that 
e < min{&;~ 4 , 2~ e D~ ek } there exists no for which the following is true. Let Vq, 
Wj, j = 1, . . . , k, be disjoint vertex sets of sizes n < \Wj\/2 < \Vq\ < \Wj\, 

n = | Vb| + \Wi\ H h \W k \ andO < p = p(n) < 1. Let Bj, 1 < j < k, be 

(e,p)-regular graphs with bipartitions (Vq, Wj) and p- densities d p (Bj) = dj, 
where min dj > e x l 2k . Assume further that the graphs Bj have no (D,p)- 
dense patches and that the maximum degree in the graph |J ■ Bj is at most 
Dnp. Let #(Sk, IJj Bj) be the number of copies of the star Sk contained in 
U . Bj and such that the vertex of degree k is in Vq and the vertices of degree 
one each belong to a distinct set Wj. Then we have, 

k 

#(S k ,\jBj) e ~ \VQ\ V h \[\Wj\dj. 

j 3=1 

Proof: For any subset V of Vq, set S(V) = #(S k , (jj B'j), where B'j is the 
subgraph of Bj induced by the vertex set V U Wj. Let degj(v) = deg B (v) 
be the number of neighbors of vertex v in graph Bj. Clearly, 

k 

s(n = EIl de s»' 

vev 3=i 

so we would be done if we knew that all, not just almost all, vertices of Vq 
satisfy deg (w) ~ d^|PV,-| for all j. Unfortunately, this is not so, and we have 
to somehow handle the exceptional degrees. Of course, for a lower bound on 
S(Vq) we can just leave them out. Let 

VLaii = & G Vq : deg» < (dj - e)p\Wj\}. 
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By the (e,p)-regularity of Bj it follows that |V^ ma J < e|Vb|. Let 



y small — Vg r 

i=i 



k 

T/ 

small ' 



Then 

k k 

S(V ) > S(V \V small )= IIdfiKi(«)> E Ufa ~ WW] 

vev \v sma ii j=i vev \v sma it j=i 

k k 

> \V Q \{l-ke)p k \[d j (l-E Z ' A )\W j \ > \V,\{l-ke){l-e^) k \[d 3 \W 3 

3=1 3=1 
k k 

> \V \(1 - (k + l)e 3 / 4 ) J] dj\Wj\ > (1 - e 1/3 )\V \ J] dj\Wj\. 

3=1 3=1 

Note that for inequality (*) above we use dj > e 1 / 4 , which follows from 
dj > e 1 / 2 ^ and k > 2. The assumption l/k> e 1 / 4 validates inequality (**). 

We now proceed to prove a corresponding upper bound on S(V ). We will 
partition the vertices of V into three classes V me d, Vu g and Vh ug e, according to 
their degrees, and will see that, as expected, the majority of the contribution 
to S(Vo) comes from V me d, whereas the contribution from the other two 
classes is negligible. Let 

V m ed = {veV : Vj de gj (v) < (dj + e)p\W j \}, 

V huge = {v G V : 3j de gj (v) > Dp\Wj\} 

and 

V big = V\(V med UV huge ). 

Note that S(V) = S(V me( i) + S(Vb ig ) + S{Vh uge ). As in the case of the lower 
bound, it is immediate that 

S(V med ) < \Vo\Hid, + e)p\W j \ *< |F b fc (l+^ 1/3 /2)n^l^l' ( ? ) 

j j 

where, as above, (***) follows from the assumptions that dj > e 1 ^ and 
k < e-V4. 
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Next, observe that by the (e, p)-regularity of B/s we have \ Vu g \ < eA;|Vo|. 
Hence 

S{V Ug ) < \V bl9 \{Dp) k \[\W 3 \ <£k\V \(Dp) k Y[\Wj\ 

j j 

< V^lVolie^Dp^^lWjl^V^lVoKDpflldjlWjl (8) 

j j 

Finally, by the assumptions on the maximum degree and the lack of 
(D , p)-dense patches, for all v £ Vo and for all j — 1, . . . , k we have dj(v) < 
Drip and |14 use | < kn/logn. Hence 

S(V huge ) <k-?-(Dnp) k = o(\V \p k l[d j \W j \). (9) 
Putting (7), (8) and (9) together gives 

3 

□ 

Of course, the above result can easily be adopted to asymptotically count 
copies of star forests as well. 

Example 4.9 Consider a graph G £ Q, a star forest S, and a partition tt 
of V = V{G) as in Section 3. For each a = 1, . . . , v and b = 1, . . . , a , let 
V' a Q: V a and W' ah C W a f, be sets of size at least n/logn. By Property (P5) 
of Q there are no (2,p)-dense patches in G. Moreover, by the same property, 
the bipartite subgraphs F' ab = G[V^,W^ b ] are all (2n _1 / 5 ,p)-regular. Thus, 
by v simultaneous applications of Proposition 4.8 (one for each star of S), 
we obtain an asymptotic formula for the number k' of the constellations in 
G contained in the union of F' ab s: 

a=l 6=1 

In particular, the number k — \C\ of all constellations in G consistent with 
7To satisfies: 

k = \C\ ~ m l/+ ^p^ > where m = . 

(See, e.g., Figure 4) 
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~ m 5 p 4 x m x m 2 p x m 2 p x m = m 11 p 6 

Figure 4: Illustration of the example 



Finally, we state a sparse version of the Szemeredi Regularity Lemma. 
Theorem 4.10 below was independently observed by Y. Kohayakawa and V. 
Rodl and, in fact, its proof is a simple modification of Szemeredi's proof. For 
applications of this variant of the regularity lemma see [20], [22] and [16], 
Section 8.3. 

Theorem 4.10 (Sparse Regularity Lemma) For all e > 0, D > 0, and 

integers to and r, there exist T and uq such that ifGi, . . . , G r are graphs with 
a common vertex set V , \V\ = n > n , andp = p(n) is such that G-i does not 
have (D,p)-dense patches for alii — 1,2, ... ,r, then every equipartition ofV 
into to parts has a refined equipartition V = W\ U . . . U W t , where to < t < T , 
and all but at most en 2 p edges of G± U . . . U G> belong to the pairs {Wi, Wj} 
that are [e,p)-regular with respect to all graphs Gi, i — 1, . . . ,r. 

This theorem is essentially the same as the one presented in [22] . The theorem 
in [22] guarantees that all but an e-proportion of the pairs (Wi, Wj) in the 
partition are (s,p) -regular with respect to all Gi. Given the fact that there is 
an upper bound on the p-density of every pair (i.e., no (-D,p)-dense patches), 
we have that at most an eD-proportion of the edges belong to such pairs. 
Hence the theorems are equivalent (after re-scaling the constants). 
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4.2 The Subgraph Regularity Lemma 

Let us summarize the state-of-art at the end of Section 3. Given a graph 
G G Q \ TZ, as a far reaching consequence of assumption (3) of Theorem 
2.1, we have constructed a star forest S and a partition ir of the vertices 
of G into \V(S) | sets of equal size m = [n/\V(S)\\ and, possibly, one set 
of less than (^(S 1 )! vertices. (For clarity, we will be further assuming that 
n is divisible by | V(jS') | .) Based on this partition, we have focused on the 
set C of all constellations, that is all copies of S in G that are consistent 
with ttq. Among the constellations we have distinguished a set S of special 
constellations. These are the ones that correspond to selected copies M x 
of a certain graph M, such that G U Mx G 1Z. Later in the proof of our 
main result, we are going to apply a regularity lemma to the family S. The 
purpose of this section is to prove such a regularity lemma in a more abstract 
setting, not using the concrete definition of the family S. 

Consider a hypergraph whose vertices are the edges of G and whose edges 
are the members of S. By abuse of notation we may also denote the hyper- 
graph itself by S. As we roughly explained in Section 2.2, we wish to apply 
a regularity lemma to S, and partition its vertices, i.e. the edges of G. 
Our objective will be to find such a partition so that various parts of the 
partition will carry the edges of S (special constellations) in a fairly regu- 
lar manner with respect to all constellations, just as regular pairs in a usual 
Szemeredi partition of a graph have edge density that is, in a sense, smoothly 
distributed. 

This partition of S will rely heavily on the underlying graph G, and we 
will start out by partitioning the vertices of G, thus inducing a partition of 
the edges of G (which are the vertices of S\). Only in the next step will we 
focus on the sets thus formed and introduce a further partition of the edges of 
G that is not induced simply by partitioning the vertices of G. We will then 
iterate this procedure in a manner similar to the one introduced by Frankl 
and Rodl in [12], see the next subsection. We are aware that the two roles of 
edges of G as vertices of S may be confusing, and hence, having pointed out 
the hypergraph structure, we will now stick to the language of subgraphs, 
thus avoiding ambiguity. 

Let F be the set of all edges of G with one endpoint in V a and the other 
in Wab for all a = 1, . . . , v and b = 1, . . . , 4> a . We will identify F with the 
spanning subgraph of G consisting of all these edges. 



39 



The regularity lemma we are aiming at will later be applied only to sub- 
graphs F of graphs G G Q \TZ, created in the above way by selecting first a 
star forest S and an initial partition tt . However, it will be convenient to for- 
mulate it more generally for all graphs like F, that is, for graphs structured 
by S and ir but with no reference to Q and to assumption (3) of Theorem 
2.1. Moreover, with no additional effort, our regularity lemma can be proved 
with respect to subgraphs other than S. Therefore, we will soon set up a 
general framework in which the regularity lemma will be stated and proved, 
bearing in mind that the only application we need is that for the star forest 
S. But first, in order to give stronger foundations for our generalization, we 
describe one more instance of regularity. 



4.2.1 The Hypergraph Regularity of Frankl and Rodl 

Recently, Frankl and Rodl [12] proved a powerful regularity lemma for 3- 
uniform hypergraphs. A simplified version of that lemma is described briefly 
here as another piece of motivation for the general framework we are up to. 

Given three disjoint sets Vi, V 2 , V 3 , a triad is a triple F = (F 12 , F 23 , F 13 ) of 
bipartite graphs with vertex sets V1UV2, V2UV3 and V1UV3. We will identify 
F with the union of its three component graphs. We refer to a 3-partite 3- 
uniform hypergraph 7i with a 3-partition (Vj., V2, V3) as a 3-hypergraph. For 
a triad F with the same vertex partition as 7i, we say that F underlies Ti. if 
"H C Tr(F), where Tr(F) is the family of the vertex sets of the triangles in 
the graph F. 

Let H, be a 3-hypergraph with an underlying triad F. The density of 7i 
with respect to F is defined by 



dn(F) 



\HnTr(F)\ 
|Tr(F)| 



In other words, the density measures the proportion of triangles of F which 
are triples of TC. 

Let 5 > 0. A 3-hypergraph 7i is said to be 5-regular with respect to an 
underlying triad F if for every subtriad Q = (Q 12 , Q 23 , Q 13 ) of F, Q 12 C 
F 12 ,Q 23 C F 23 ,Q 13 C F 13 , with |Tr(Q)| > <5|Tr(F)| we have 

\d H (Q) - d H (F)\ < 6. 
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Thus, the 5-regularity of 7i reflects low discrepancy between the densities of 
H, with respect to those subtriads of F which are relatively rich in triangles. 
With a fixed 7i in mind, we call the triad itself 5-regular. A triad which is 
not 5-regular is called S -irregular. 

Let V be a set, and let [V] 2 denote the set of all 2-element subsets of V. 
An (/, t, e)-partition II of [V] 2 consists of an equipartition V — V\ U • • ■ U Vt, 
together with a system of edge-disjoint bipartite graphs F^ with bipartitions 
(Vi, Vj), 1 < % < j < t, 1 < k < hj < /, such that 

Vi\ \Vj\ for all i, j, 1 < i < j < t, and 

• all but at most e{^)m 2 pairs {vi,Vj}, Vi G Vi, Vj E Vj, 1 < i < j < t, 
belong to graphs F^ which are e-regular. 

Note that there are at most Q) I 3 triads F made up of the graphs of II. Let 
be the set of all 5-irregular triads among them. 

For a 3-uniform hypergraph 7i = (V,E), with \V\ = n, we say that an 
(/, t, e)-partition II of [V] 2 is 5-regularii 

|Tr(^)l < 

Theorem 4.11 For every 5 > 0, integers to and Iq, and all decreasing func- 
tions e{€), there exist Tq, Lq, and Nq such that any 3-uniform hypergraph 
H = iy,E) with \V\ > N admits a 5-regular, {I, t,e{l)) -partition for some t 
and I satisfying to < t < Tq and Iq < I < Lq. 

Remark 4.1 In [12] a more general and powerful hypergraph lemma guar- 
antees, for every integer function r = r(£), a stronger, (5, r)-regular partition. 
Above we discussed the case r = 1 only. 

4.2.2 General Subgraph Framework 

In what follows H always stands for a fixed graph with respect to copies of 
which other graphs are regularized. The reader can bear in mind that the 
cases of H = K2 and H = K3 correspond, respectively, to the Szemeredi 
and Frankl-Rodl regularity schemes, and that H = S, the star forest from 
Definition 3.3, will be our principal (and only) application. 



I fa T? ir J 
Ufc=l r k 
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Given a graph H with h vertices and an integer m, let H m be the order 
m blow-up of H, i.e., the /i-partite graph obtained by replacing each vertex 
x of H by a set V x of m vertices, and by replacing every edge xy of H by the 
complete bipartite graph K m ^ m spanned between V x and V y . Let J-(H, m) 
denote the family of all subgraphs F of H m . The graphs F £ !F(H, m) will 
be called H -graphs and a copy if' of if in F will be declared consistent if for 
each x £ V(if), its image x' belongs to V x (see Figure 5 ). By a copy of H in 
an if -graph we will always mean one which is consistent. Let C(F) = Ch{F) 
be the set of all copies of H in F. 

Besides graph F, the other important input to our regularity lemma is an 
arbitrarily specified subfamily of C(F), the elements of which will be called 
special copies. The subfamily itself will be denoted by S. By choosing this 
notation we put emphasis on our primary application with H = S and S - 
the family of special constellations. To obtain the standard Szemeredi setting 
take H = K 2 , F — K m m and S - the set of edges of a bipartite graph. For 
the hypergraph setting of Frankl and Rodl take H = K 3 , F = K m ^ m<m and 
S - the set of triples of a 3-hypergraph. 

4.2.3 Measuring Density 

In the classical case of graphs, regularity is related to the density measured 
by the ratio of the number of edges to all potential edges. For 3-uniform 
hypergraphs, regularity will be with respect to the ratio of triples to the 
triangles in the underlying triad. Similarly, we will measure the density of 
various (spanning) subgraphs R of F by the ratio of the number of special 
copies of H to all copies of H contained in R. Since our primary application 
is in the sparse case, the density will have to be normalized by a scaling 
factor p*, which will typically depend both on H and on the average density 
of the sparse graph in question. 

Having fixed H, let us also fix F £ T{FL, m) with V(F) = Uxsf(h) 
<S C Ch{F) and < p* < 1. Any subgraph R of F is itself a member of 
J-{H, m) and thus C(R) = Ch(R) is already defined, meaning the family of 
copies of H in R. As for the special copies, though, they can only be inherited 
from F. We set 

c R = \C(R)\ and a R =\SnC(R)\. 
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abed - consistent copy of H 
Figure 5: consistent copy of H — C4 
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For all subgraphs R C F, with cr > 0, we define their (S,p*)- density by 

d R = (10) 

p*c R 

The normalizing factor p* helps in bounding the density from below. It 
should be related closely to the ratio sp/cp between the numbers of special 
copies and all copies of H in the entire graph F. 

The price we pay for using a normalized density is the potential presence 
of some kind of dense patches. It is unavoidable to have small subgraphs R 
with unbounded d R (e.g., if cr = sr = 1 and p* = o(l)). Thus, all we can do 
is to control the (iS,p*)-density of large subgraphs of R, where "large" means 
"rich in copies of H" . Set 

n=\C(F)\ 

for the total number of copies of H in F ( "total volume" of F) . 

Definition 4.6 For fixed H and D* > 0, an F G ^F(H, m) with n = hm 
vertices, together with a nonempty family S C C(F), is said to have no 
(D* ,p*)-dense H-patches, if for every subgraph R C F with cr > n/\og 2 n 
we have dR < D*. 

Note that the assumption of no dense patches applied to R = F and the 
definition of density in (10) imply together that 

p* > > —n~\ (11) 

y ~ D*c F ~ D* v 1 

unless sp = 0. 

The property of having no dense patches, together with an upper bound 
on the number of copies that one edge may belong to, implies that subgraphs 
of F with few copies of H must also have few special ones. 

Definition 4.7 An iJ-graph F with n vertices is called H -uniform if no edge 
of F belongs to more than k/ log 3 n copies of H. 

Proposition 4.12 Assume that an H-graph F (together with S) has no 
(D* ,p*)-dense H-patches and is H -uniform. Let R C F with cr < ft/log 2 n. 
Then s R < (1 + o(1))D* P *k/ log 2 n. 
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Proof: Given R with cr < k/ log 2 n, we add to it extra edges of F, enlarging 
R to obtain a supergraph R' with k/ log 2 n < cr> = (1 + o(1))k/ log 2 n. This 
can easily be accomplished by adding one edge at a time, and noting that 
each edge can increase the number of copies of H by at most k/ log 3 n. Thus, 
because there are no (-D*,p*)-dense patches in F, it follows that dn> < D*, 
and hence s fl < s R > < (1 + o(l))D*p*c R '. □ 

4.2.4 Partitions and Polyads 

The subgraph of an //-graph F, induced by a pair (V x , V y ) corresponding to 
the edge xy G H, will be denoted by F xy . The sets V x for all x G V(H) and 
the graphs F xy for all xj/ G E(H) form #ie initial partition H of a fixed H- 
graph F. The regularity lemma we prove in this section involves refinements 
of the sets and graphs of Ho and towards this we define a (t, l)-partition, by 
generalizing the definition of Ilo. 

Informally speaking, a (t, /)-partition is an equipartition obtained by split- 
ting each set V x into t disjoint subsets V x) . . . , V x . This partitions every 
subgraph F xy into t 2 tubes 

F% = F*y\Vi,Vl], 1 < <■■>- '■ 

which are induced bipartite subgraphs spanned between the partition sets. 
We then partition every tube into l(xy,i,j) < I edge-disjoint subgraphs. 

Definition 4.8 A (t, Z)-partition II consists of the following ingredients: 

equipartitions V x = V x U ■ ■ ■ UV X , xeV(H) (12) 

and 

partitions F%j} — \J F^>\ xy G E(H), z,j = l,...,t (13) 

k=l 

A refinement of II is any (t', /')-partition, with t' > t, I' > /, which consists 
of refinements of (12) and (13). 

In particular, any (t, /)-partition is a refinement of the initial (1, l)-partition 
n , and a (t, Z)-partition divides the vertex set into th classes and the edge 
set into J2 xy eE(H) l ( x y,hj) < \ E ( H )\ t21 classes. 
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We now define a basic notion whose role is similar to that of bipartite 
subgraphs in the classical Szemeredi Lemma, and to that of triads in the 
hypergraph regularity lemma of Frankl and Rodl. 

Definition 4.9 A polyad P consistent with a (t, /)-partition II is a subgraph 
of F obtained by selecting one subclass Vx for each x G V(H), and then 
one subgraph FjJ, ^ for each xy G E(H). More formally, given i(x) 
for x G V(H) and k{xy) for xy G E(H), 



is the corresponding polyad. 

Observe that there is only one polyad consistent with the original parti- 
tion n , which is F itself, and, more generally, there are at most s = t h l^ E ^ H ^ 
polyads consistent with a (t, Z)-partition (with equality if every tube is parti- 
tioned into exactly I parts). Let V = Vu be the set of all polyads consistent 
with a (t, /)-partition II. Note also that every copy of H belongs to exactly 
one polyad. Consequently, recalling that k = \C(F)\ is the total number of 
copies of H in F, we have 



Per 

Finally, notice that if \1/ is a refinement of II then for every polyad R G 
there exists a unique polyad P G Vu such that R C P. 

4.2.5 Regularity 

Polyads are the substructures of this construction which play the same role 
as e-regular pairs in a Szemeredi partition of a graph. 

Definition 4.10 Let 5 > 0. Given an if -graph F and a family S of special 
copies of H in F, a polyad P is said to be S-regular, if for every subgraph 
Q C P with cq > 5cp, we have 




(14) 



dp - dq\ < 5. 
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In other words, 5-regularity of P reflects low discrepancy between the (S,p*)- 
densities of those subpolyads of F which are relatively rich in ordinary, con- 
sistent copies of H. A polyad which is not 5-regular will be called 8-irregular, 
or just irregular. The set of all 5-irregular polyads of Vu will be denoted by 
V£ or just V irr . 

We are now two notions away from our regularity lemma. The first of 
them ensures (e, p)-regularity of most subgraphs constituting a (t, /)-partition. 

Definition 4.11 Let e > and < p < 1. A (t, I) -partition II is called 
(e,p)-uniform if all but e\F\ edges of F are in (e, p)-regular subgraphs F£jj ,k . 

The next definition specifies what kind of regularity we impose on parti- 
tions in terms of 5-regular polyads. Roughly, the irregular polyads together 
cannot contain too many copies of H. Recall that k = \C(F)\. 

Definition 4.12 Given an if -graph F and a family S of special copies of H 
in F, a (t, /)-partition IT is 5-regular if 



We now come to the central lemma of this section, which, when applied 
with H = S, the star forest from Section 3, puts into action the mechanism 
that makes the whole proof tick. It describes, in the spirit of the Szemeredi 
Regularity Lemma, a decomposition of an if -graph F in which the special 
copies of H are evenly distributed, i.e. it guarantees the existence of a ir- 
regular partition. In addition, at the same time the partition that is obtained 
is (s,p) -uniform. This extra feature is very useful in our application. 

Lemma 4.13 (Subgraph Regularity Lemma) For all graphs H , constants 
8 > 0, D > 0, D* > 1, and for all functions e{£) > 0, there exist integers 
T ,L ,n such that 

1. for every n = hm > n , < p = p{n) < 1, a graph F G F{FL, m) with 
\V{F)\ = n such that 

(a) F is H -uniform 

(b) F has no (D,p)-dense patches, and 
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2. for every < p* = p*(n) < 1 and S C C(F) such that F (together with 
S) has no (D* ,p*)- dense H-patches, 

there exists a refinement Hi of the initial partition n which is a 5-regular, 
(e (I), p) -uniform, (t, I) -partition for some I < L and t < T . 

Remark 4.2 In both classic cases, H = K 2 and H = K 3 , all copies of H in 
H m are consistent, and consequently, the corresponding regularity lemmas 
are true for arbitrary graphs F, not just for bipartite or tripartite graphs 
F e J r (H,m). However, for general H the "partiteness" seems necessary. 
Consider, e.g., H being a path on vertices 1,2,3, and let F = K 3m be a 
complete graph on vertex set V = V\ U V2 U V3. Further, let cp count all 
copies of H, while sp - only those consistent with the partition (Vi, V2, V3). 
Then no regularity lemma may guarantee a 5-regular partition of F . Indeed, 
every polyad P with large sp must contain a bipartite subgraph R C F[V±, V2] 
with still large cp but, clearly, with sp = 0, making P 5-irregular. 

Remark 4.3 Our lemma admits edge partitions besides the more traditional 
vertex partitions. Consequently, non-induced subgraphs are considered. It is 
then closer in spirit to the hypergraph regularity lemma from [12], than the 
classical Szemeredi lemma, where pairs of vertex partition classes determine 
induced subgraphs. 

Remark 4.4 If both p and p* are constants, say 1, then the assumptions 
about the absence of dense patches are vacuously satisfied with D = D* = 1. 
Indeed, e.g., by (10) we have then dp = sp/cp < 1 and no (1, l)-dense H- 
patches exist (see Definition 4.6). In the sparse case, however, the assumption 
of no (-D*,p*)-dense if -patches, which plays a key role in the proof, can be 
very hard to verify. For the application in this paper, i.e. for H being a star 
forest S with edges, and p* = p®, where p = p{n) is as in the Setup on 
page 16, this is done in Section 4.4 with considerable technical effort. 

Remark 4.5 Using the relevant definitions, it is easy to see that the con- 
dition of not having any (D,p)-dense patches bounds p from below: p > 
l/(n 2 .D); similarly, as observed above (see (11)), not having (_D*,p*)-dense 
iJ-patches yields p* = Q(n~ h ). 
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Remark 4.6 We will use this lemma with e(£) < £~~ 4 ^. The reason for 
having e decrease with i so quickly is that we will need e to be much smaller 
than l/£, a lower bound on the average p-density of the resulting subgraphs 
(seeproof of Lemma 5.4). 



4.3 Proof of the Subgraph Regularity Lemma 

The proof is based on a technique from the original Szemeredi Regularity 
Lemma: refining the initial partition U until it becomes 5-regular. This 
procedure will be monitored by the index of a partition. We will see that 
as long as a current partition is not ^-regular, it can be refined, resulting in 
a partition with a substantially larger index (this is Lemma 4.15). A priori 
bounds on the index given in Lemma 4.16 will guarantee that this process 
terminates successfully after a bounded number of steps. Throughout, the 
(e, p)-uniformity of the partitions will be maintained as well. 



4.3.1 Pumping the Index 

Let 

OK = ~ (15) 

K 

be the relative volume of a subgraph R of F. It measures what percentage 
of the copies of H which can be found in F are contained in R. Note that in 
view of (14), 

£>p = 1. (16) 

p g7 ,irr 

Let 

Index(n) = opdp log dp 
Pev n 

be the index of a partition IT. Thus, the index is a weighted sum of a convex 
function (xlogx) of the (iS,p*)-densities of the polyads consistent with II. 
The merit of this convex function is that whenever a (t, Z)-partition is not 
5-regular it is possible to refine it into a (£', /')-partition with a substantially 
larger index. 

To start with, let \1/ be any refinement of a current partition II as defined 
in Definition 4.8, and, for a given polyad P G Vu, let A(P) = A(\l/, P) be the 
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set of all polyads R consistent with \1/ and such that R C P. Clearly, 

P* = |J A(P). 

Setting 

Cp / ctr 



Cp \ Up 

and using the identity Op = cr P fi R we have 

Index(^) = ^ a R d R \ogd R = y~V P ^ fi R d R logd R . 

Pev n ReA(P) P fleA(P) 

Observe that dp = X] A^^-R an d X] PR = 1- Hence, by Jensen's inequality, 

-ReA(P) 

for each P £ P n , 

yupdpiogdp > I y^/ipdp j log I y^/ipdp j = d P \ogd P , 

ReA(P) \ R J \ R J 

and consequently Index > Index (II). 

The above application of Jensen's inequality can be sharpened signifi- 
cantly in the case when P is a 5-irregular polyad, provided we impose an 
extra constraint on partition ty. Given a subgraph Q of P, we say that a 
partition \I/ refines Q if its edge partition (13) refines the partition (Q, F — Q), 
that is, every subgraph from (13) is either contained in or disjoint from Q. 

Lemma 4.14 (Defect Lemma) Let P £ Pg r be a 5-irregular polyad and 
Q P be such that 

(i) C Q > Sep an d 

(ii) \d P - dq\ > 5. 

Ifty is a refinement of II which refines Q then 

2S 4 

Efi R d R log d R > dp log dp + — . 
dp 

i?GA(P) 
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For the sake of readability, we defer for a moment the technical proof of 
Lemma 4.14. An immediate consequence of Lemma 4.14 is a jump in the 
value of the index when a partition has many 5-irregular polyads and its 
refinement is properly chosen. A subgraph Q which satisfies conditions (i) 
and (ii) of Lemma 4.14 with respect to a given 5-irregular polyad P will be 
called a witness of irregularity for P. 

Lemma 4.15 (Index Pumping Lemma) IfU is a 5 -irregular (t, I) -partition 
of F and \& is a refinement of U which also refines at least one witness of 
irregularity for each polyad P £ "p irr ; then 

5 5 

Index (\&) > Index(n) + — . 
Proof: As before, but using Lemma 4.14 instead of Jensen's inequality, 

Index = y^a P fi R d R logd R 

p ReA(P) 

> opdp log dp + o-p^— 
t—f ( dp 

P p g -pirr 

> Index(n)+ Yl a PJp> 

PeV in , c P >n/ log 2 n 

For the last inequality above we used the assumption that there are no 
(-D*,p*)-dense if-patches in F, and thus, if cp > n/\og 2 n then dp < D*. By 
the ^-irregularity of II (see Definition 4.12) and by (15), we have 

crp > 5. 

pgpirr 

Moreover, for sufficiently large n, recalling that \V\ < s = t h l^ E ^ H ^, 
S op <y^\og~ 2 n<s\og~ 2 n<-, 

P:c p <k/ log 2 n P 

because t, I and H do not depend on n. Hence, in view of (16), 

2_> ap ~D* ~ ~D*' 

PeV 1IT , c P >re/ log 2 n 
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□ 

The Index Pumping Lemma works hand in hand with the following esti- 
mates. 

Lemma 4.16 For all partitions IT of F, 

— < Index(n) < D* logD* + o(l). 

e 

Proof: The lower bound follows easily from (16) and the fact that the func- 
tion xlogx achieves its minimum at x = 1/e. 
For the upper bound, we split 

Index ( II) = opdp log dp + a P d P log dp. 

P:cp>ft/log 2 n P:cp<K/log 2 n 

Because F has no (-D*,p*)-dense if-patches, we can bound the first term by 
.D*log-D*. As for the second term, by Proposition 4.12, cp < lo * 2 - implies 

that s P < ^+f)\ D 'P'\ Hence, 

— log n 

apdp = s p /(k P *) < (1 + o(l))D*/(\og 2 n). 

Moreover, for all polyads P we have log dp = O(logn), because by (11) (or 
see Remark 4.5) 

d P = ^<\ = 0(n h ). 
p*cp p 

Hence, 

y~] apdplogdp < s- — 5— O(logn) = o(l), 

9 lOg U 

P:cp<K/\og n 

also using the fact that the number of polyads is at most s = t h V E ( H '\ = 0(1). 
□ 

Remark 4.7 Note that we would not have been able to bound the second 
term above as easily, had we used the more traditional functional, ^2i P d P , 
in the definition of index. The index function was introduced by Szemeredi 
in [Sz] as the main tool in the proof of his regularity lemma (Theorem 4.5 
above), and later used also in [FR] to prove the hypergraph regularity lemma 
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(Theorem 4.11 above). The idea of replacing d 2 by any convex function in 
this context was suggested by Ajtai, Komlos, Szemeredi [E. Szemeredi, oral 
communication, 1978]. Such an alternative index function first appeared in 
print in [26]. 

The above two lemmas basically complete the proof of Lemma 4.13, pro- 
vided we can construct a required refinement which in addition is (e(l),p)- 
uniform. Then, iterating the process of refining a current partition, we must 
reach a 5-regular (t, /)-partition. The bounds in these lemmas imply an upper 
bound on the number of iterations, solely in terms of D* and 5. 

In order to conclude the existence of constants To and Lq such that t <Tq 
and I < L , we need to show how the refinement is created. Likewise, al- 
though the asymptotic estimates we encountered so far contribute toward 
the value of n , the most demanding constraint on no comes from the re- 
finement which invoke Theorem 4.10. A description of how the refinement is 
constructed is deferred to the next subsection. We complete this subsection 
by giving a proof of the Defect Lemma. 

Proof of Lemma 4.14: Let A(Q) be the set of all polyads R consistent 
with \1/ and such that R C Q. Recall that fip = cp/c P and set 



Since \l> refines Q, we have 5] jRgA( q ) c R = c Q and Y,rga(Q) S R = S Q- Tnus > 
by (i), /i = cq/cp > 5 > and 



Moreover, /i = 1 then cq = cp, sq = sp, and consequently, dp = c?q, 
contradicting (ii). Hence, we may assume that [/, < 1, and by Jensen's 
inequality, 




-RgA(Q) 




ReA(Q) 



and 



ReA(P)\A(Q) 




) 
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So, setting d = dp, d! = da, and d" = — for convenience, 

1 — (J, 

2^ H-Rdii^ogdR > fid' log d' + (1 — fi)d" 'log d" '. (17) 

ReA{P) 

Consider the function 

f{x) = x log ( — J + (d — x) lo; 



fl J \ 1 — jJL t 

Clearly, fifid) = dlogd, and further, since [id' + (1 — \i)d" — d, 
f(lid') = /id' log d'+ (d-fidf) log' d ~ fid ' 



Thus in view of (17), 



l-fi 

fid'logd' + (1 - fi)d"logd". (18) 



^2 fJ.Rd R logd R - dlogd > f(fjd') - f(fjd), 
ReA(P) 

and we will be done if 

fifid') - f^d) > ^. (19) 

Towards this, compute 

' d — x 



fix) = log (£) - log (j- 



and 

fix) = - 



x d — x 

and note that find) = 0, and that fix) > A/d, for all < x < d. Thus, by 
the Lagrange formula for the remainder, we have, for some < fid < x < 
fid' < d, 

ff - f( ~ *„, ^d'-fid) 2 4 5' 25' 
fifid ) - fifid) = f ix ) ~dJ = ~d~' 

since fi > 5 and \d' — d\ > 5. This proves (19), completing the proof of 
Lemma 4.14. □ 
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4.3.2 Refining a Partition 

We will now finish the proof of Lemma 4.13. As already observed prior to 
the proof of Lemma 4.14, it remains to construct a (t', /')-partition \I/ with 
the following properties. Recall that a subgraph Q which satisfies conditions 
(i) and (ii) of Lemma 4.14 with respect to a given 5-irregular polyad P is 
called a witness of irregularity for P. The new partition \I/ must 

1. be a refinement of a current 5-irregular (t, /)-partition II, 

2. refine at least one witness of irregularity for each irregular polyad of II, 

3. be (e(r),p)-uniform. 

Recall that a partition II gives rise to at most s = t h l^ E ^ H ^ different polyads. 
Suppose that polyads Pi, . . . ,P S > are 5-irregular with witnesses of their irreg- 
ularity Qi, . . . , Q s i. The requirement that the new partition must refine each 
Q g , g = 1, . . . , s', is easy to fulfill. Let Venn{Q!, . . . , Q s i} = {Ri, . . . , R s »}, 

s' 

s" < 2 s , be the family of all graphs of the form f\ Q\ g ' , for e g £ {0, 1}, where 

9=1 

n * a _\Q 9 e 9 = 1 
^ \F-Q g e g = 0. 

The graphs R g ,g = 1 . . . s" , partition (by intersection) every subgraph F^jj' k 
into smaller subgraphs F^' k,g = F^ k H R g , that is, 

s" 

pi,j,k _ I pi,j,k,g 
9=1 

This new partition F^' k ' 9 , clearly, refines every subgraph Q g , g = 1, . . . , s'. 
Note also that the previous partition of every tube into at most I subgraphs 
is now refined into at most I' finer subgraphs, with I' = Is" < 12 s < 12 s . 

So far we have only partitioned the edge sets of II. With the help of 
Theorem 4.10 (the Sparse Regularity Lemma) we will now refine the vertex 
equipartition of II to ensure that the new equipartition is (e(/'),p)-uniform 
(and still refines every Q g ). By the assumption on F, none of the resulting 
graphs F*'i' k ' 9 has (_D,p)-dense patches (F^ ,k,g is a subgraph of F). Thus, we 
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may apply Theorem 4.10 with e = e(l'), D = 2, t = t and r = \E(H)\t 2 l2 a , 
to the graphs {F£g' ' 9 } with the vertex equipartition given by II, where xy e 
E(H), i, j — 1, . . . , t, k — 1, . . . , l(xy, i,j) < I and g = 1, . . . , s". This way we 
obtain a finer equipartition of the vertices which together with the described 
above partition of the subgraphs, defines an (e(/'),p)-uniform (£', /^-partition 
This new partition consists of the following parts: 

• Vertex sets defined by the equipartition given by Theorem 4.10: 

where -< denotes refinement. 

• Edge sets 

xy xy L x' ' y\ ' 

The new parameter t' is bounded by T of Theorem 4.10 with the above 
inputs, and so depends on I, t, H and the function e(£). Starting from 
t = I = 1 and iterating for a bounded number of times, as determined in 
the previous subsection, yields the promised constants T and L of Lemma 
4.13. We do not attempt to compute any of these constants explicitly. This 
completes the proof of Lemma 4.13. 

4.4 No Dense Patches 

In the previous section we proved Lemma 4.13. Now we are going to verify 
that it can be applied in the setting relevant to the proof of Lemma 2.4. Let 
us summarize again the state-of-art at the end of Section 3. As a consequence 
of assumption (3) of Theorem 2.1 we have selected a star forest S consisting 
of v stars with, respectively, 0i, . . . , (j) v arms. We set 

\E(S)\ =</> = <!>! + ... + </>„ 

and so |V r (5')| = v + <f>. For clarity, we will be further assuming that n is 
divisible by v + 0. Under this assumption, Lemma 3.3 provided us with a 
partition of the vertices of G into sets of equal size m — n/ '{v + <fi), further 
denoted 

n = {Vi,...,V v ,W n ,...,W 14l ,...,W v , 1 ,...,W v ,^}. 
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Based on this partition, we have focused on the set C of all constellations, that 
is all copies of S in G that are consistent with ttq. Among the constellations 
we have distinguished the set S of special constellations, related to the special 
role of certain additions of a copy of M to G. 

Recall that M is an arbitrary (balanced) graph with p(M) = 2 with ver- 
tices labeled x±, . . . , x u , and that M is a graph built upon M and consisting 
of a number of tepees spreading above the edges of M. It is M which gave 
rise to the star forest S by deleting one leg of each tepee (seeThe Missing 
Leg Property - Observation 3.4). The vertices of S not belonging to M (i.e. 
the tips of tepees) are denoted by u ab , where a = 1, . . . , v and b = 1, . . . , a , 
in such a way that u ab are the neighbors of x a . The graph G is an arbi- 
trary member of family Q defined in Definition 6.1. For all a — 1, . . . , v and 
b = 1, . . . , (f) a , set F ab = G[V a , W ab ] and 

F = \jF ab . 

ab 

A copy of 5" in F is consistent if each x a is mapped onto a vertex of V a and 
each u a b is mapped onto a vertex of W ab . 

Note that F G J-(S,m). We want to apply the Subgraph Regularity 
Lemma 4.13, with H = S, to the graph F, family S, and sequences: p = p(n) 
as in the Setup preceding Lemma 2.4 and p* — p?. This normalizing factor p* 
is to be expected because there are edges that must appear in G to turn a 
constellation into a special constellation (the missing legs of the tepees), and 
informally (if G were random), each edge would appear with probability p. 

In order to verify the assumptions of Lemma 4.13 for F, we have to show 
that for some D > and D* > 1 (independent of G, and thus independent 
of S and in particular), 

1(a): F has no (D,p)-dense patches 

1(b): F is S'-uniform 

2: F, together with given S, has no (-D*,p^)-dense S'-patches. 

All these properties of F will be derived from Definition 6.1 of family Q to 
which G, a supergraph of F, belongs. Property 1(a) follows immediately with 
D = 2 from Property (P5) of Q, while property 1(b) is an easy consequence 
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of Property (P3), that is, of the bounds on the vertex degrees in G. Indeed, 
by Example 4.9, there are 



(consistent) copies of S in F, while a given edge belongs to at most 



n^(2pn)^ = O (-^) = o ( 



log n 



(20) 



of them, for large n. The next lemma establishes assumption 2 with D* = 
2 q + 1, where, recall, g = [lOCV/a] is an upper bound on (see Lemma 
3.1). Recall also that cr is the number of constellations (i.e., consistent with 
7r copies of S) contained in a subgraph R. 

Lemma 4.17 For every subgraph R C F with cr > /t/log 2 n we have djt < 
2" + 1. 

Let us set U a = V(R) H V a , R ab = R n F ab and deg ab (w) = deg Rab (v) for 
all a = 1, . . . , v, b = 1, . . . , <p a . It is straightforward to express cr in terms of 
the degrees deg ab (v). Indeed, we have 



v-l 

Cr 



U »\U EII de g» ■ ( 21 ) 

a=l \veUa 6=1 / 

This simple formula plays a key role in our proof. First, it trivially implies 
that 

V ( 4>a \ 

o=l V 6=1 / 

where 5 a b = mm v deg ab (v ). Second, in connection with the estimate of k, 
(21) yields lower bounds on \U a \ and A ab = max„ deg ab (t> ) in terms of cr. We 
will use them later in the proof. 

Proposition 4.18 For all 1 < a < v, 

(a) \U a \ = Q (^—n^j, and 

(b) A ab = Q i^ n v) f or each b = l,...,<t> a . 



Proof. Using Property (P3) of G, we can further estimate (21) as follows. 
For each 1 < a < v and 1 < b < <p a we have 



In view of (22) to prove Lemma 4.17 we need to prove a matching upper 
bound on sr. However, our current proof techniques allow us only to set a 
bound of the form 



(see later in this section). So, we can only obtain the desired bound on <1r 
if 5 a b and are at most a factor of two apart. This leads to the following 
definition. A subgraph R of F will be called degree sandwiched if there exist 
numbers r a b, a = 1, . . . , v, b = 1, . . . , a , such that for all v G U a , we have 
Tab < deg ab (v) < 2r ab . 

To put ourselves in a more comfortable position, we will cover R with its 
degree sandwiched subgraphs in such a way that each constellation in R is 
contained in exactly one of them. Remembering that < q, the following 
lemma will imply Lemma 4.17. 

Lemma 4.19 For every degree sandwiched subgraph R C F with cr > 
K/log* +3 n we have d R < 2* + o(l). 

In order to see how Lemma 4.19 implies Lemma 4.17 we need one more fact. 

Proposition 4.20 For every degree sandwiched subgraph R with cr < k/ log^ +3 n 
there exists a degree sandwiched subgraph R' such that R C R' C F and 




By Example 4.9, we have k ~ m^ 4 "*]?^, which completes the proof. 



□ 




(23) 



K/log* +3 n < c R >. = (1 + o(l))«:/log </,+3 n. 
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Proof. We will construct a sequence of subgraphs R = R 1 C R 2 C • • • C F 
such that each i? J is degree sandwiched and is obtained from Ri~ l by adding 
at most 2np = 0(y/n) edges. Note that by Lemma 3.3, F itself is degree 
sandwiched between \mp and |mp. By (20), the increments in c R j are at 
most 0(k/ti) = o(k/ \og^ +3 n), and so there must be some j such that R' = R? 
is the desired subgraph. (Note that it is certain that such an index j will be 
found many steps before reaching F. Indeed, by (22), we will be done when, 
for example, all U a will be of order Q(n) and all degrees of order at least 
np/ logn.) 

Our construction is performed in two phases. In phase one, for each a and 
b, we turn R fl F ab into an induced subgraph of F ab , and then in phase two 
we will add one by one all remaining vertices of lj a V a . A single step of phase 
one consists of fixing a vertex v in U a of minimum degree r in fl F ab 
and adding some min(r, deg F (v) — r) edges vu G F ab \ RP -1 . This way the 
process maintains the "degree-sandwichness" throughout, and is bound to 
arrive at R> = F ab [U a ] for some j. A single step of phase two consists of 
adding a vertex v G [J a V a \ V{R^) together with all its neighbors in F. □ 

Proof of Lemma 4.17: Let R C F with \C(R)\ =c R > n/\og 2 n. Without 
loss of generality assume that for all a = 1, . . . , u, b = 1, . . . , <f> a , and all 
v G U a , we have deg R (v, W ab ) > 0, and define for each j = 0, . . . , [log A ab J 

UL = {ve U a : |£ < deg^M < ^} . 

This subdivides every set U a into 0(\og^ a n) classes 

tf 1 = fl Eft 

6=1 

so that the degrees in R of all vertices from one class, along every subgraph 
F ab are sandwiched between a number and its double. By selecting one 
partition class from each set V a and taking all the edges of R with endpoints 
in these classes, this yields a covering of R by 0(log* n) subgraphs R\, R2, ■ ■ ■ 
which are all degree sandwiched. Set di = d Ri , etc., for convenience. Clearly, 
we have c R = £\ a, s R = J2i s ^ and 



pf>c R f-^ c R , P Vc R 

Ci>K/ log^+ 3 n Ci<K/\og&+ 3 n 
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Applying Lemma 4.19 in the first sum, we bound it by 2* + o(l). For the 
second sum, we use a similar idea as in the proof of Proposition 4.12. For 
each i, let R[ be the degree sandwiched supergraph of Ri as in Proposition 
4.20. Applying Lemma 4.19 to R' { , we obtain 

2^V*K 

Si < s R , < (2* + o(l))p*cm < (1 + o(l))- 



1ok* +3 n 



and so 

T = O ( = 0(log-' n) = o(l), 

by the assumption that c R > k/ log 2 n. □ 

The rest of this section is devoted to proving Lemma 4.19. Let R be a 
degree sandwiched subgraph of F with c R > k/ log* 4 " 3 n. As for its vertex 
set, we have, for each a and b, V(R) D W ab - and R ab = RnF ab . The degrees 
in R are such that for all a and 6, and all v £ U a 

Tab < deg Rab (v) < 2r ab , 

for some r ab > 0, where recall deg R (v ) = deg i? (v , Wa&) = deg R (v ). 
In view of (22), to prove Lemma 4.19, it suffices to show that 

Sfl< (l+o(l))/2*J] |C/ a |JX r ^ ) " ( 24 ) 

o=l \ 6=1 / 

The following corollary of Proposition 4.12 will be of some use. 

Corollary 4.21 If c R > n/log^n, then for all\<a<v, 
ft 

(a) \U a \ > , and 

log v n 

(b) r ab > U ^ for each b = 1, . . . , <ft a . 

vog v n 

□ 
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Let us now briefly outline the rest of the proof of Lemma 4.19: we want 
to bound the number of special constellations in R, or more specifically we 
want to prove (24), where U a = V(R) fl V a . Recall from Section 3 that every 
special constellation is of the form S(X) for a set of vertices X G X 3 which 
contains one vertex from each set V a . Such a constellation corresponds via 
the Missing Leg Property (Observation 3.4) to a set of <fi edges of G (the 
missing legs.) These edges, together with the edges of the constellation itself 
form a set of tepees over the edges of Mx, a copy of M planted at X (see 
Figure 3). 

We will construct a i/-partite auxiliary graph A on the vertex sets Ui, . . . , XJ l 
that actually contains these graphs Mx for which S(X) C R, but possibly 
also other copies of M. Hence, the number of consistent copies of M in A 
(such that x a is mapped onto a vertex of U a ) will be an upper bound on the 
number of special constellations in R. 

More precisely, for every pair a, a' for which x a x a i G E(M) define the set 
T aa i of indices b such that x a u a t,x a i is a tepee in M. Formally, using notation 
a' (a, b) from Section 3, 

T aa , = {b:a' = a'(a,b)}. (25) 

We will draw an edge in A joining v G U a with v' G U a > , if for each b G T aa i 
the vertex v' is connected by an edge of G with a vertex of W a b which is a 
neighbor of v in R. If each bipartite subgraph A aa i = A[U a , U a >] was highly 
regular, then using Proposition 4.3 it would be easy to count the number of 
copies of M in A. 

To achieve this high regularity of A aa i we will consider its supergraph 
A' aa , with the entire V(G) in place of U a i. Then the degree of v G U a in A' aa , 
will be asymptotically determined, via Property (P6) of G, by the sizes of its 
neighborhoods in W a b for all b G T aa i. We know that for distinct v they are 
all sandwiched between r a b and 2r a b, but may vary, obstructing the desired 
regularity. To remedy this problem we will enlarge these neighborhoods so 
that they are all equal 2r a i > . Since we still want to keep the co-degrees fairly 
low (as they are by Property (P4) of G), we will make these enlargements 
randomly. Only after this is done, we will formally define the auxiliary graph, 
prove that it is highly regular and finally count copies of M in it. 

Fix a, b and set r = r ab . Let k = \U a \, U a = {vi, . . . , v^}, and iVj = Ni(b) 
be the neighborhood of V{ in R^. Note that r < |iVj| < 2r and, by Definition 
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6.1 (P4), for all i ^ j, \N { n Nj\ < 3 logn. Note also that by Corollary 4.21 
we have, say, r > n 0A9 . The following lemma gives us exactly what we need. 

Lemma 4.22 For a sufficiently large integer n, let Ni, Nk, k < n, be 
a set system of subsets of W , where 9 in < \W\ < n, for all i, n 0A9 < r < 
< 2r < 9 2 \/n, and for all i ^ j, |iVj fl Nj\ < 3 logn. Then there exist 
sets Ni = Ni, i = 1, . . . , k, such that for all i 

(a) N t C Ni 

(b) \Ni\ = 2r 

(c) for all i ^ j, \N~i D~Nj\ < 12 logn. 

Proof: Pick Ni, . . . , Nk randomly, uniformly and independently from [H^] 2r , 
the family of 2r-element subsets of W. Then with probability 1 — o(l), 

{NiHNjl < 31ogn, Vz<j and \Ni n JV,-| < 31ogn Vi,j. (26) 

This is because for a fixed set iV C W, and a random set N C W, \N\ = 2r, 
the random variable Z = \N Pi N\ has hypergeometric distribution with 
expectation |JV||iV|/|W| < 40%/ 9 X . By Chernoff's inequality (e.g., (2.11) and 
Thm. 2.10 from [16]), 

Pr(Z > 3 logn) < e" 31ogn = n~ 3 . 

As there are fewer than 2n 2 pairs of sets (Ni,Nj), i < j, and (N{,Nj), the 
probability that at least one pair will violate (26) is less than ^ — > 0. From 
(26), clearly, 

\{Ni U Ni) n (Nj U Nj)\ < 12 logn. 

Since 3r — 3 logn > 2r for large n, the required sets Ni, . . . , iV^ can now be 
chosen arbitrarily in such a way that |iVj| = 2r and that ^^#^^11^. 
Then 

Ni n Nj C (Ni U Ni) n (Nj U Nj), 
and thus (c) holds as well. □ 

Let Ni(b) be the extentions of neighborhoods Ni(b) of vertices Vi G U a in 
Wab whose existence is guaranteed by Lemma 4.22. We now define the auxil- 
iary graphs. For every aa' such that x a x a > G E(M) let A' aa , = (U a , V(G), E' aa ,), 
where for Vi G U a ,v' G V(G): 

Viv' G E' aa , & \Ni(b) n N G (v')\ > 1 for all b G T aa ,. 
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Figure 6: Enlarging iVj's 



Remark 4.8 Since U a C V(G), the above definition involves a slight abuse 
of notation. Formally we are creating a graph between a copy of U a and 
V(G). We use V(G) and not V(G) \ U a for technical reasons. However, we 
will momentarily consider the subgraph of A' aa , induced between U a and U' a , 
so that this conflict will be resolved. 

Remark 4.9 In the case that T aa i = we take A' aa i to be the complete 
bipartite graph, as the condition in the definition is fulfilled vacuously. 

Next define the bipartite subgraph A aa i C A' aa > as the induced graph between 
the vertex sets U a and U a >, and finally let 

A = |J Aaaf. 

x a x a ,&E{M) 

See Figure 7 below for an illustration of A. Our definition of A aa i guaran- 
tees that every special constellation in G yields a (not necessarily induced) 
subgraph of A isomorphic to M. More formally let 

#(M, A) = the number of (consistent) copies of M in A. 
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Proposition 4.23 



s R <#(M,A) 



Proof: This follows immediately from the definition of A and the Missing 
Leg Property. Every special constellation S(X) has one vertex exactly in each 
of the sets U a . The definition of A is designed so that a copy of M is placed 
on X if condition (a) of the Missing Leg Property holds for S(X). Note also 
that every copy of M corresponds to at most one special constellation, else 
the corresponding copy of S(X) would violate condition (b) or (c) of the 
Missing Leg Property. □ 



Wi 



W 2 



Vi 



v 2 



W 6 



V 5 



Figure 7: The auxiliary graph A 



Remark 4.10 Note that most likely there will be many "fake" copies of M 
in A, that is, copies of M which do not correspond to any special constellation 
S(X). There are two reasons for this overcount. First, we may create a copy 
of M in A over a set X ^ X$, or for which X is not isomorphic to M, but 
rather contains M as a proper subgraph. Second, in veryfying the definition 
of an edge in A we may have used the vertices of Ni(b) \Ni(b), which, in fact, 
are not neighbors of Vi. 

We will show now that A aa i inherits from G some random-like properties. 
This in turn will allow us to use Proposition 4.3 to easily estimate the number 
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of copies of M in A. Targeting Property (P6) of G, for each aa' with x a x a r G 
E(M), define 

v= n (i-(i-p) 2ra6 ), 



and note that 



(1+2C 2 )-* n 2pr ab < n ^ < laa , < n 2^, (2?) 

where the leftmost inequality follows from the fact that 2pr ab < p(2np) < 
2C 2 . For our next lemma recall notation x ~ y defined at the end of Section 1. 

Lemma 4.24 For each aa' such thatx a x a ' G E(M), the graph A aa > is n -1 / 30 - 
regular with density 

n -l/30 

d(A aa >) ~ 7 oa /. 

Proof: Our main tools will be Lemma 4.1 and Property (P6) of Defini- 
tion 6.1. This property implies that for all v iy Vj G U a , the degrees deg(vj) 
and co-degrees codeg^, Vj) in A' aa , are quite close to their averages. More 
precisely, let Y aa i = {b\, . . . ,b g }. Applying Property (P6) first with k = g, 
and Si = Ni(bi), I = 1, . . . , k, and then again with k = 2g, Si = Ni(bi) and 
S k+ i = Nj(bi), I = 1, . . . , k, we get, respectively, 

(I) Vi> G U a |deg(f) — nj aa >\ < n 4 / 5 , and 
(II) Vv,v' G U a |codeg(t>, v') — wy 2 a ,\ < n 4//5 . 

Properties (I) and (II) imply, respectively, (i) and (ii) of Lemma 4.1 with 
d = 7o ', and, say, e = n~ u ^ m (and with W = [U a ] 2 )- Indeed, by (I), 

deg(u) > n^aa* - n 4/5 = n(-f aa > - n~ 1/5 ) > n(d - e), 

which is precisely condition (i) of Lemma 4.1. Also, by Corollary 4.21(b), 
and the above lower bound (27) on 7 aa /, 

1 / np 2 V 1 ^ 

> n 



laa , > (i+2c 2 r^ n 2 Prab > J- ( np2 y > 

J -- L logn \ (lognr +4 / 



logn (logn)<^+ 4 ) 
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and so 

Thus, by (II), 

codegO, v') < n^ 2 aa , + n 4/5 = n{^ 2 aa , + n" 1/5 ) < n(d + e) 2 , 

and so, condition (ii) of Lemma 4.1 is true for all pairs v,v'. Moreover, 
Corollary 4.21(a) implies that for all 1 < a < v 

\U a \ > — ZL_ > 2n ll ' m = -. 
log® n e 

Hence, by Lemma 4.1 the graph A' aa , is (16n _11,/60 ) 1 / 5 -regular. Recall that 
A aa i C A' aa i is the induced (bipartite) subgraph of A' aa , on the vertex set U a U 
U a >. Therefore, we infer by Observation 4.2 with e' = \U a '\/n > (logn) _< ^ _4 , 
that A aa / is n _1 / 30 -regular. □ 

Proof of Lemma 4.19: In view of Lemma 4.24, we apply Proposition 
4.3 with k = v and B aa i = A aa / if x a x a > G E(M) and otherwise B aa i be- 
ing the bipartite complete graph between U a and U a ', so that #(M,A) = 
#(^>Uaa'- B «a')- Hence, 



#(M, A) = (1 + o(l)) J] \U a \ ]J laa < (1 + o(l))(2p)* J] |C/ a | Ilraft, 

a=l aa' a=l 6=1 

which, by Proposition 4.23 proves (24) and thus completes the proof of 
Lemma 4.19. □ 



5 The Core Section (Proof of Lemma 2.4) 

In this section we will define the notion of a core, show that every hitting set 
of the family S of all special constellations contains a core, prove that there 
are few cores and that every core is large. In other words, we shall prove 
Lemma 2.4. Fix an arbitrary r > 0. 
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5.1 What is a Core? 



Cores will be defined through a ^-regular partition Hi of G guaranteed by 
Lemma 4.13. To make this definition more user- friendly, after applying the 
regularity lemma we will give suggestive names to all the parts of the final 
partition, and only then define cores formally. 

5.1.1 Applying the Regularity Lemma 

Let c*3 and q be as defined in Section 3, and constants u, c, C as in the Setup 
on page 16 (see also the Glossary). For an arbitrary r > 0, let r\ be any 
constant satisfying < rj < 1 and 

T 

ijlog(l/rj) < 

Furthermore, let us set 



and 




For each star forest S with at most q edges there are constants T = 
To(S), L = Lq(S) and n = rio(S) determined by Lemma 4.13 with D = 2, 
D* = 2 q + 1, the above defined 5 and e(£), and with H = S. The meaning 
of these constants is that for every n > no, for all < p,p* < 1, all S- 
uniform, 5-graphs F with |V(F)| = n, and no (2,p)-dense patches, and for 
all S C C(F) with no (2 9 + l,p*)-dense S'-patches, there exists a refinement Hi 
of the initial partition LTo. This refined partition LTi is a 5-regular, (e(l),p)- 
uniform, (t, /)-partition for some / < Lq and t < Tq. Set T\ = maxsTo(S'), 
L\ = max 5 Lq(S) and ni = ma.xsn (S). 

Let G e Q be as in Lemma 2.4 (in particular, |V(G)| = n > ni) and let 
F be a subgraph of G G Q determined by the partition 7r . Furthermore, let 
S be the family of special constellations in F generated by assumption (3) of 
Theorem 2.1 as shown in Section 3. Recall that S, a star forest with v stars 
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and < q arms altogether, is the isomorphism type of the constellations. 
For clarity, we will be further assuming that n is divisible by v + 0, so that 
the initial equipartition n has, in fact, all sets of equal size. We will apply 
Lemma 4.13 to the pair (F, S) with p = p{n) as in the Setup on page 16 and 
p* = pv . We have proved in Section 4.4 that in such a setting (F, S) does fulfill 
all the above assumptions of Lemma 4.13. For the purpose of regularization, 
set F ab = G[V a , W a b] and let Il denote the initial partition made of 7r , the 
partition of vertices, and of the edge sets between the corresponding vertex 
sets. In other words, the elements of LTo are: 

• vertex sets: V a , a — 1, . . . , v\ W ao , b = 1, . . . , <p a , for a — 1, . . . , v — 1. 

• edge sets: E(F ab ), a = 1, . . . , u, b = 1, . . . , (f) a . 
Finally, let 



\jF ab 



and note that F G m), where 



n 

m = — — = K = \Wab\- 
u + (p 

Let LTi be the final (e(/),p)-uniform, 5-regular (t, /)-partition, t < T\, 
I < Li, guaranteed by Lemma 4.13. It consists of vertex equipartitions 
V a = VZu---UV2,a = l,...,u,andW ab = WhU---UW* b ,a = l,...,v-l, 
b = 1, . . . , (p a , and graph partitions 

l(ab,i,j) 

= G[V:,Wi] = |J F^\ a = l,.. .,!/-]., 6=1,. ..,0 a , 

k=l 

l(ab,i,j)<l, i,j=l,...,t. 

For simplicity, we assume that m is divisible by t and thus for all a, 6, i and 
j we have 

W'l = \w j h \ = — . 

I a. I \ ab\ £ 

The number of edges in different graphs F%j' may vary. Note that F = 
I I j?V'. fe 

\Ja,b,i,j,k ab 
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Remark 5.1 The assumption that t divides m is not very serious. Since 
t < Tx, every m which is divisible by T\\ is also divisible by t, no matter 
what t turns out to be. Thus, our proof is absolutely formal for all m which 
are the multiples of T\, that is, for all n which are the multiples of Ti\ (is + (/)). 
The argument for other values of n is essentially the same with only some 
obvious, though tedious, notational complications. 



5.1.2 Anatomy and Polyads 

We use suggestive terminology throughout this section. The definitions of 
tubes and polyads given below are equivalent to those introduced in Section 4 
in a more general setting of if -graphs. Figure 8 illustrates these notions. 

• The sets V* will be called palms. 

• The sets W 3 ab will be called fingernails. 

• The subgraphs F l J b k will be called fingers. 

• The induced subgraphs F l J b = Ufc=i F % ab w ^ n ^ e called tubes. 

• A polyad P consists of v palms V a p (one for each a) , <fi fingernails W^ b 
((p a for each a), and <fi fingers (one from each tube F % ab , where 
V a p = V: and W[ b = Wi b ). 

• An olympiad O consists of v palms V® (one for each a), (j) fingernails 

(cf) a for each a), and (p tubes F^ = F^ b , where V® = V* and 

wg = wi b . 

• The partition Tli will be called myriad as it contains all the elements 
of the puzzle, that is, it consists of all tv palms, tip fingernails, and at 
most t 2 lu(p fingers. 

Note that there are precisely t u+< ^ olympiads in the myriad, and at most 
polyads in any given olympiad. Throughout, we will treat the olympiads 
(and the myriad) both CIS db set of polyads and clS cL subgraph of F (which is 
the union of those polyads.) 
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fingernails 




Figure 8: The Anatomy 
5.1.3 The Definition of a Core 

Let £ = s(l). We say that a polyad P is nice if all the fingers in P are 
(e, p)-regular, and have p-density at least 3e^ . We say that a polyad P is 
healthy if the following three conditions hold: 

• P is nice, 

• P is 5-regular, 

• dp > 5. 

We define precores first. A precore will be a subgraph which contains a 
substantial part of every healthy polyad P. Roughly speaking these parts 
will be formed by choosing at least a half of the vertices of just one palm V^, 
and for each vertex v G choosing a finger F^ b , where b = b(v), and taking 
more than a (1 — ^-fraction of the deg P (v, W^ b ) = deg F p (v) edges incident 
to v in that finger, where rj is defined at he beginning of Section 5.1.1. 
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More formally, denote the set of all healthy polyads by r p health v . For each 
p e V heaithy define a family preCORE p of subgraphs of P as follows: J G 
PRECORE p if J C P and 

, „ , D , f = for a ^ a 7 

(i) 31<a J <,:|y(J)nK P |{> i = o. 5 |^| for a = 

(ii) Vv G V(J) n V a p 3 6(u) G {1, . . . , (j) a } such that 

A ( WP \ J = f ° r ^ ^ 

aegj^, KK aj>6 ; I > (i _ v )deg P {v, W[ j b ) for 6 = b(v) ' 

Now we are ready to define the family PRECORE of subgraphs of F: 

PRECORE = {J: J= (J J p , J p g PRECORE p } (28) 

p ^/phealthy 

The graphs in PRECORE are called precores. Thus, a precore J is a union 
of subgraphs J p , one for every healthy polyad P. 
We are one step from the definition of a core. 

Recall that the (pre)coloring a of all the edges of Uxe* 3 "^PO — that 
is, of all the edges of special constellations, is determined by a fixed triangle- 
free coloring a' of the graph M. For any precore J contained in {J X ex s ^POj 
the coloring a partitions its edges into J re d and Ju ue - Define Maj(J) to be 
the bigger of the two sets (ties are broken arbitrarily). Then 

CORE = {Maj(J) : J G PRECORE, J C (J 

The graphs in CORE are called cores. The reader is encouraged to draw an 
analogy between the above definition and that given in Section 2.2.1. 

In the forthcoming subsections we will verify that the family CORE de- 
fined above satisfies conditions (a-c) of Lemma 2.4. 



5.2 Every Hitting Set of S Contains a Core 

We have come to the heart of our construction. The following lemma, to- 
gether with Lemma 3.5, immediately implies Lemma 2.4(a) - see below. Re- 
call that a hitting set of a family of graphs is a set of edges which intersects 
the edge set of every graph in the family. 
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Lemma 5.1 Let T be a hitting set of the family S of special constellations. 
Then there exists a precore J G PRECORE such that J C T. 

Proof: Consider an arbitrary subset T of E(F) not containing any precore 
J G PRECORE. We are going to show that there is a special constellation 
disjoint from T. Because T contains no precore, there exists P e V healthy 
such that for all J G PRECORE p ° we have J % T (otherwise, T would 
contain a union as in (28)). That is, for all a = 1, . . . , v, the set 

V a p ° = {v G V a p ° : V b = 1, . . . , a : deg TnF P (v) < (1 - r^deg^ («)} 

ab ab 

has size 



1 ° 1 2t 

In other words, more than half of the vertices of each palm retain in T at 
most (1 — ?7)-fraction of their neighbors they had within each finger of Pq. 
Defining Q = P — T, we see that for all a — 1, . . . , z/, if v G V p ° then for 
all 6 = 1, ... ,0 a 

de SQ nF p A v ) >vdeg F pJv). 

ab ab 

Recall that for a subgraph R of F, symbols cr and Sr stand, respectively, 
for the numbers of constellations and special constellations contained in R. 
Let Q and P be induced subgraphs of, respectively, Q and P with sets 
yPo res tricted to V Po . Then, trivially Cq > Cq q and, by formula (21), 

Since Po is nice, all fingers P ab ° are (e,p)-regular with p-density at least 
3e^ . Moreover, by Observation 4.6, all subfingers F^C\Pq are (2e,p)-regular 
with p-density at least 3e^ - e > (2e)^ and also d p (F^) £s d p (F p ° n P ). 
Hence, by two applications of Proposition 4.8, to Po and to Po, 

^ > (l-£ 1/4 )(0.5r > (0.4)", 

and consequently, 

c Qo > Cq o > rfc h > (0.4)Vcfl, > ^ Po , 

by our definition of 5. Therefore, since Po is a 5-regular polyad of (S,p*)- 
density greater than 5, 
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implying that sq > 1. This means that there exists a promised special 
constellation S G S such that S C Q C F — T. □ 
Proof of Lemma 2.4(a): Note that for every triangle- free coloring x of G 
the set Agree(x,cr) is a hitting set of the family S. Thus, by Lemma 5.1, 
there is a precore J contained in Agree(x, o). Since the latter set is contained 
in Uxe^ 3 ^(^) — F -i the set Maj(J) is a monochromatic core under \. 

□ 



5.3 There Are Few Cores 

Now we give a quick proof of Lemma 2.4(b). 
Lemma 5.2 |CORE| < exp{rn 3 / 2 }. 

Proof: We will actually show that |PRECORE| < expjrn 3 / 2 }, which is 
stronger. It follows from the definition of precores that for all J G PRECORE, 

V 

all v G V(J) fl |J V a , and for all a, 6, i, j and 

o=l 

0, or 



r = o, 

deg Jn ^ f W|>(i_ 77 ) deg 



We will bound the number of all subgraphs of F with that property. For 
every vertex v G V a there are at most 2^ tl choices of the "substantial" fingers 
F^ b k along which v has positive degree. For each choice of substantial fingers, 
if the degree of v in their union is r, then the number of choices of neighbors 
of v in J (which is also the number of choices of the non- neighbors) is at 
most 

f:( r 1 )<vr( r )<n( 2 r), 

^ \kj \TfrJ \7]2pnJ 
by Property (P3) of G. Thus 

|PRECORE| < ( 2*" n ( 2pU < (2^nY n ( -X^ < e 

\ \r]2pn J J ' \t)J 



2Cr/u 

rn 3 / 2 



for n sufficiently large, by the definition of rj. □ 
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5.4 Cores Are Large 

In this section we will show the most difficult part of Lemma 2.4, that is, 
part (c). We will prove it with 



A 



160 \D*q(v + q) 
where 

a ± = 3 J 2g a 3 and D* = 2 q + 1. 

Recall that olympiads were defined in Section 5.1.2 and can be viewed as 
sets of polyads. The plan is to cover a precore J by subgraphs arising from 
different olympiads: 

J= (J J 0j where J a = \J J P and O baaKby = O n V healthy , 

Oelll pg^hcalthy 

and then to show (see Lemma 5.6 below) that for a certain type of olympiads, 
called perfect, we have 

. . a 4 j)m 2 
\Jo\ > 



10D*(j) 2 t 2 ' 

We will prove that there are at least (a4 1 /8D*)t u+( ^ perfect olympiads (see 
Lemma 5.4 below). This and the obvious fact that an edge may belong to at 
most t u+< ^~ 2 olympiads, yields the following result. 

Lemma 5.3 For all J G PRECORE we have \J\ > 2Xpn 2 . 

Proof: By Lemmas 5.4 and 5.6 we have 

\J\ > x aiPm " x r-* 2 = 2X P n\ 

1 1 ~ 8D* 10D*(j) 2 t 2 

□ 

Lemma 5.3, together with the fact that |i?(Cr)| < pn 2 (see Property (P3) of 
Q), imply Lemma 2.4(c). 
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5.4.1 Perfect Olympiads 

Recall that k = ^2 P£Vjl cp is the total volume of the myriad. By Proposition 
4.8 (see Example 4.9) 

k ~ m v+ V- 

Let 

K 

be the average volume of an olympiad. Setting 

Co = ^2 Cp ' 

we have, again by Proposition 4.8, that for all olympiads O 

Co ~ Kq. 

Let O rcg denote the set of 5-regular polyads contained in O, and O m = 
O \ O rcg be the set of irregular ones. Write 

c o — c o * c o 

where c^ g = Y1pzo™b Cp anc ^ c e> r = Speo irr Cp ' anc ^ se ^ 

s = s p- 

Finally, let O mcc denote the set of nice polyads contained in O and 

nice _ 

c Q — °p- 

Definition 5.1 We call an olympiad O perfect if the following three condi- 
tions hold: 

cT > (1 - 5)« (29) 
% g > (1 - Vtf)«o, (30) 

and 

so > a^Ko (31) 
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Lemma 5.4 At least ^^7^ +< ^ olympiads are perfect. 

Proof: By Property (P5) of Q, 

\E(F)\ ~ (pm 2 p. 

Let us begin with a count of the edges of F belonging to fingers which are 
either (e, p)-irregular or of p-density smaller than Ss 1 / 2 ^. Let us call such 
fingers nasty. First, from the (e,p)-uniformity of partition LTi, there are at 
most e\E(F) \ edges belonging to fingers that are (e, p)-irregular. 

Next note that every tube is an induced subgraph of F split into at 
most I fingers, so even if they were all of p-density smaller than 3e 1//2?i , they 
would contribute no more than a 3/e 1,/2 ^-fraction of the edges. As we have 
/ < and e < e 1 / 4 ^, altogether there are fewer than Ae 1 ^ 4: ' t> \E(F)\ edges 

which belong to nasty fingers of F. 

Furthermore, by Property (P3) of G, any given edge may belong to at 
most n u ~ l {2np)^~ l constellations. Call a constellation spoiled if at least one 
of its edges belongs to a nasty finger. Then, recalling the definition of a nice 
polyad from Section 5.1.3, cq — cg ce counts precisely those constellations of 
olympiad O which are spoiled. Thus, 

J^ico ~ c n S cc ) < (1 + o{l))Ae^ct)m 2 pn^ 1 {2npf~ 1 . 
o 

On the other hand, denoting by Y7/ the number of olympiads which do not 
satisfy condition (29), we have 



nice n 
O ) 



o 



which, using the definitions of e, k , k, and m, and the relation cq ~ Ko, 
yields the bound T7 < (1 + o(l))(a 4 / '4D*)t v+ * '. 

Next, let Y7 be the number of olympiads which do not satisfy condi- 
tion (30). We claim that T7 < VSt u+(1, < (a 4 /8D*)t I/+<?i , where the latter 
inequality follows just by the definition of 5. Indeed, if Y7 > \f6t v+< ^ then 

Cp = Yl Cp> V^^V^Ko = 
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- contradiction with the (^-regularity of 111 (compare Definition 4.12). 

Finally, let Yjj" be the number of olympiads fulfilling condition (31). We 
claim that > |(a 4 / J D*)t 1Af *. Indeed, by Proposition 4.12 and the fact 
that F has no (D* , p^)-dense S'-patches, we have, for every olympiad O, 

s = Sp+ Sp - D V^/log 2 n + D y ^ c P 

c P <re/log 2 n c P >K/\og 2 n P&O 

= O (p*«o)+Dy«o<|i>y« . (*) 

Recall from Section 3 that |«S| = a 3 n 1 ', and suppose that Tg" < -^(a^/ D*)t u +< ^. 
Then 

2 n/Af 11 ^ 3 

151 < --==j— r£>y« + f+VAo = 2a 4 p^ < a 3 n" 

- a contradiction. Hence, we conclude that there are at least 

T+ - T- - T- > ( -— - — — - o(l)] t u+ * > a4 t v+4> 

3 i 2 ~ \3D* AD* 8D* v 7 8D* 

perfect olympiads. □ 

Recall that hcalth y = Of) V healthy is the set of healthy polyads contained 
in an olympiad O. The next lemma will be used in the proof of Lemma 5.6. 

Lemma 5.5 For a perfect olympiad O 



E «4 
Cp ~ 2D* K °- 



£>^:£)hcalthy 

Proof: Because there are no (D* , p^-dense S'-patches in F, we have cp > 
sp/(D*p^) for each polyad P with cp > K/log 2 n. Hence 

Cp ^l¥^ Y sp - 

p^lQhealthy p^Qhealthy 
Cp>K/log 2 n 

Clearly, 

/] s P > s - 22 Sp ~ Sp ' 

p e0 healthy pg healthy Cp<fc/log 2 n 

Cp> re/log 2 n cp>fc/log 2 n 
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Recalling the definition of a healthy polyacl, to estimate further we must then 
subtract from so the contribution coming from polyads with cp > ft/ log 2 n 
that are either not nice, or 5-irregular, or of (S, p*)-density less than 5, and 
also from polyads with cp < ft/ log 2 n. First, since O satisfies condition (29), 

sp<D*p* cp = D*p<f>(co-c™ c )<(l+o(l))5D*p<t>K . 

Cp >«/log 2 n 

Second, using the fact that O obeys condition (30), 



s P < D*p' t> c P = D*p\c Q - c r cg ) < (1 + o(l))VSD 



y« . 



p gC ,irr PGO irr 

Cp> re/log 2 n 

Third, 

P:d P <8 P:d P <8 

Last, by Proposition 4.12 

Y s p < (! + o(l))DV-4- = o(l)Ao. 

2 lOg n 

cp <re/log n 

Putting all this together and using the fact that O fulfills (gold) and thus 
so > a^no, we finally get 



s P > (a A - 5D* - VdD* - 5 - o(i; 



P v Ko 



p^Qhealthy 

Cp> re/log 2 n 



> 



(a 4 - 3V6D* - o(l)) p*« > ^4 - ^"4 - o(l)J p*« 



and consequently 

Cp "2^ K °- 

PtzQhealthy 



□ 
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5.4.2 Precores Are Large 



We now concentrate on the contribution to the precore J coming from healthy 
polyads of one perfect olympiad. Recall our notation 



U >• 

p££)hcalthy 



The following lemma, together with Lemma 5.4 completes the proof of Lemma 
5.3, which, in turn, finishes off the proof of Lemma 2.4, yielding Theorem 
2.1, and thus our main result - Theorem 1.1. 

Lemma 5.6 For every precore J £ PRECORE and for every perfect olympiad 
O , we have 



\M > 



a^pm 
10D*d) 2 t 2 ' 



Proof. The idea of the proof of Lemma 5.6 is to take only one finger from 
each polyad P - the one chosen by the majority of the vertices in the def- 
inition of J p and then show, using Lemma 5.5 and Proposition 4.8, that 
many of those fingers will belong to the same tube. As such, they will be 
edge-disjoint and adding up their contributions to the respective J p 's will 
give the desired lower bound on \Jo\- 

Recall the definition of the family PRECORE p from Section 5.1.3. For 
every P £ O health y and for every J p £ PRECORE p , let a P and b P be such 
that 



\v(J F ) n Kpl > Yt and ^ v e V ( J ) n < ■ b ^ = * 2ff 



Then ap and bp determine a finger of P, denoted further by F p 
belonging to the tube F® , . Obviously, 



F p 

apbp ' 



U JF 


> 


P(zQ healthy 





|J ( J p n F* 

p^Qhcalthy 

Note that by the definition of J p , by the (s,p) -regularity of fingers and the 
inequality e < 1/ (20) , 



k'nn>i(i-#-ri>5 



(32) 
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where the last inequality follows because, say, both 77 < 1/6 and e < 1/6. 
For every a, b consider the edge-disjoint union F ab of selected fingers F p 
belonging to the tube Ff%, that is 



U 

ap=a,bp=b 



F p C F°. 



We claim that for some a, b 



\F°\ 

1* ab\ 



(33) 



Suppose to the contrary that the opposite inequality is true for all a, b, and 

consider the subgraph R of F which is a union of the complements F® b — F ab 
— o 

of F ab within the respective tubes. Then, since every healthy polyad of O 
— o 

must intersect some F ab , the subgraph R is a union of unhealthy polyads, 
and by Lemma 5.5 



cr <c - 



C*4 



K = 1 + 0(1) 



Q 1 



Kq. 



2D* V v ' 2D*. 
On the other hand, by property (P5) of Definition 6.1, tubes are (s,p)- 



regular of p-density 1 — o(l), and the graphs F^ — F ab are complements of 
unions of at most / (e, p)-regular fingers within the tube F® b . Thus, they are 
themselves (e',^) -regular, where e' < le < e 1 " 1 / 4 ^, with p-densities at least 
1 - a 4 /(3£>*0) + o(l). Thus, by Proposition 4.8 and by (33), 



m \ v +<t> 



o 

ab 



f!)> 



ab 



0(1) 



«4 



3D*( 



contradicting the above upper bound. 

Hence, let a , b be such that (33) holds. Then 



|J (J p nF> 

P(i£)hcalthy 



> 



U ( jP n f p ) 

ap=dQ ,bp=bQ 



E 

ap=<iQ Mp=b{) 



j p nF p \ 



Using (32) and (33) we easily estimate further to obtain the desired bound: 



E l^ P nf P l>E 

ap=ao,bp=bo 



\F° 



30 



> 



10D*d) 2 t 2 ' 



□ 
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6 Random Graphs 



In this section we state and prove some assertions about random graphs 
which constitute the family Q of graphs playing an important role in our 
proof. 

Let G = (V, E) be a graph, and let v, u G V, U, W C V, where UOW = 0, 
and F C E. Recall our notation 

deg G (w, W) = deg(v, W) = \ {w G W : vw G E}\, 

deg G (w) = deg(w) = deg(>, V), 

codeg(v,u) — \{w : vw,uw G 

and ea(U) and e<3(?7, W) for the number of edges of G with, resp., both 
endpoints in U, and one endpoint in U and the other in W. Furthermore, 
Base(-F) is the set of edges in the complete graph on V = V(G) that form a 
triangle with two edges in F, formally: 

Base(F) = {uv : uw, wv G F for some w G V}. 
6.1 The Graph Family Q 

Given a constant a, a sequence c/y/n < p = p(n) < C/\/n and an integer 
v > 5, let g and A be as in The Glossary and let a = a(A, c) be a constant 
determined by Lemma 2.3. We now define the property Q and show that it 
holds for G(n,p) asymptotically almost surely (a.a.s. for short), that is, with 
probability tending to 1 as n — * oo. 

Definition 6.1 A graph G = (V,E) on n vertices is said to have property 
Q = G(p, q, A, a) if the following hold: 

(PI) The number of sets of v vertices that span an edge (i.e. are not inde- 
pendent sets) is o(n u ). 

(P2) The number of sets of v vertices such that some three of them share a 
common neighbor is o(n u ). 

(P3) For all v G V, 

(1 - rT 1/b )np < deg(i>) < (1 + n~ 1/B )np < 2np. 
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(P4) For all v,u G V codeg(t>, w) < 31ogn. 

(P5) For all pairs of disjoint sets U, W C V with \U\, \W\ > n/logn, 

(1 - n~ 1/5 )\U\\W\p < e G (U,W) < (1 + n- 1/5 )\U\\W\p < 2\U\\W\p, 
and 

e G (U) < \U\ 2 p. 

(P6) For all 1 < k < 2q, where q is as in the Glossary, and all choices of 
subsets Si, . . . , Sk of V, such that \Si\ = s, < 2np, for all i = 1, . . . , k, 
and \Si fl Sj\ < 12 logn for all 1 < % < j < k, the number 

Z = Z (Si, . . . , Sfc) 

of vertices with at least one neighbor in each Si satisfies 

\Z-n\< n 4 / 5 , 

where ji = n Y\ i=1 jJ-i and jii — 1 — (1 — p) Si . 

(P7) G has property T(A, a), that is for any subgraph F of G with at least 
\\E\ edges, the set Base(F) contains at least a|\^| 3 triangles. 

Lemma 6.1 For all sequences p = p(n) such that c/y/n <p< C/y/n 

lim Pr(G{n,p) 6 Q) = I. 

Proof: We will prove that each of the above properties holds a.a.s., so, 
of course, their conjunction also holds a.a.s. The proofs of (P1,P2) rely on 
Markov's inequality, whereas those of (P3-P6) - on various versions of the 
Chernoff bound for which we refer the reader to Section 2.1 of [16]. Recall 
that the set of neighbors of v in G is denoted by Nq(v) = N(v), while Ng(W) 
stands for the set of vertices outside W, each having at least one neighbor in 
W. Setting G = G(n,p), note that |iV G (W)| is binomially distributed with 
expectation (n - \W\)(l - (1 - p) m ). 

(PI) The expected number of such sets is 0(n u p). Thus, by Markov's 
inequality there are a.a.s. no more than, say, n u y/p = o(n u ) of them. 



S3 



(P2) The expected number of such sets is 0(n u+1 p 3 ). Thus, by Markov's 
inequality there are a.a.s. no more than, say, n u+1 p 5 ^ 2 = o(n u ) of them. 

(P3) For any v G V, the degree deg(v ) is a binomial random variable with 
expectation {n — l)p. Since c^/n < np < Cy/n, the usual Chernoff bound 
((2.9) in [16]) gives that 

Pr (deg(u) £ [(1 - n~ 1/5 )np, (1 + n" 1/5 )np]) < exp {-6(n 1/10 )} . 

A simple union bound shows that a.a.s. (1 — n~ l ^)np < deg(u) < (1 + 
n~ l ^)np for all vertices v simultaneously. 

(P4) As above, for fixed u, v the random variable in question, codeg(v,u), 
is binomial with expectation (n — 2)p 2 = 0(1). From a handy version of 
Chernoff s bound ((2.11) in [16]) it follows that 

Pr (codeg(u, v) > 3 logn) < e - 31ogn = n~ 3 . 

Hence, a.a.s. the required condition holds for all (™) pairs u, v simultaneously. 



(P5) Again, given two disjoint sets of vertices U,W C V with \U\, \ W\ > 
nj logn, the number ec{U, W) is a binomial random variable with expecta- 
tion \U\\W\p = n(n 3 / 2 / log 2 n). And, again, by the usual Chernoff bound, 

Pr (dp(U, W) £ [(1 - n" 1 / 5 ), (1 + n" 1 / 5 )]) < exp {-6(n 12/11 )} . 

Since there are less than 2 2n choices of U, W, the required condition holds 
a.a.s. for all such pairs simultaneously. The second statement is proved 
similarly. 

(P6) Split Z = Z' + Z", where Z' is the number of vertices v counted by Z 
such that v ^ Ui=i ^ an< ^ deg(-u, 5$ n Sj) = for all i < j. Set 

fi' = EZ', n" = EZ" and ft = EZ = + n" . 

First, by the crude Chernoff bound ((2.12) in [16]) we have 

Pr(\Z' - n'\ > n^/2) < 2e~ 2n * 11 i/(n " s) < 2e~ 2n3/ \ 

where s = ^2 s i- Observe also that there are at most 



2np 



n 



2np 



2q 



< 



n 



2q 



exp |0(-v/nlogn)} 
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choices of Si, ... ,3^. Hence 



\Z'-n'\< n 4/5 /2 

holds s for all such choices. 

Second, by (P3) we have p!' = EZ" = 0((logn)y / n). Let Uij = S^d Sj, 
for i, j = 1, . . . , k. Observe that if Z" > n 2 ^ 3 then for some i ^ j, we have 



U N a(v) 



> {n 2 ' 3 - s)/k 2 > n 5 / 9 . 



Consequently, by the handy Chernoff bound ((2.11) in [16]) 

P(3 Si,...,S k :Z">n 2 ' 3 ) 

< P (3U C V, \U\ < 121ogn : \N G (U)\ > n 5/g ) < n 121 °g™ e -" 5/9 = (1] 

Therefore, a.a.s. for all choices of Si, . . . , S k , 1 < k < 2q, we have 

\Z-fi\< \Z' - n'\ + Z" + n" < n 4 / 5 /2 + n 5/9 + 0{(\ogn)^/n) < 2n 4/5 /3. 

Finally note that \fi — < s = 0(y/n). 

(P7) This is Lemma 2.3, proved in the next section. 

We have thus completed the proof that G(n,p) e Q a.a.s. □ 
6.2 Proof of Lemma 2.3 

Here we prove Lemma 2.3, which turned out to be both difficult and interest- 
ing. Work on it led to a separate paper [9] where its extention is utilized in 
a wider context of dynamic Ramsey-type colorings of random graphs. The 
proof provided in [9] is based on a counting technique for sparse random 
graphs. In this paper we prefer our original approach which, besides a reg- 
ularity lemma, uses an upper tail estimate established in [31]. We state it 
here in a general setting of random subsets. 

Given a finite set V and < p < 1, we denote by T p a random binomial 
subset of T. Note that ([n] 2 ) p is nothing else but a random graph G(n,p). Let 
Ti C [T] h , where h is a positive integer, and set /i = E\{H G 7i : H C r p }|. 
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Lemma 6.2 For all integers f3 > 0, with probability at least 1 — exp { — J^} 
there is E C T p with \E \ = [3, such that T P \E contains fewer than 2ji sets 
from Ti. 

For more about upper tail estimates see Section 2.6 in [16], [18] and [17], 
and for further development of the above deletion technique see [19]. 

Corollary 6.3 A.a.s. for each pair U, W C [n], U fl W — 0, there exists 
E C G(n,p) with \E \ = nlogn, such that the bipartite subgraph of G(n,p)\ 
E spanned between the sets U and W contains no more than 

copies of the 4-cycle C 4 . 

Proof: Apply Lemma 6.2 to T = U x W and 7i - the family of the edge 
sets of all 4-cycles in T, with s — 4, (3 — nlogn and /i = ('^') ('^')p 4 . 

□ 

Proof of Lemma 2.3 

Given c and A, let d = min{A 3 c 2 /270, A 4 /729}, and let g = g(d) and 
c o = c o{d) be determined via Proposition 4.4 (applied with k — 3). Further, 
let e = min{^, A/100}, and let T = T (e) be determined by Lemma 4.10 
with D = 2, t = 3 (say), r = 1, and the above e as inputs. We promise to 
prove Lemma 2.3 with a = co/T 3 . 

Let G be a graph from the space G(n,p) which satisfies Properties (P3) 
and (P5) and the property stated in Corollary 6.3. Note that Property (P5) 
implies that G, as well as each of its subgraphs has no (2,p)-dense patches. 
Fix F C G with |F| > \\G\ > (\/3)pn 2 and apply Lemma 4.10 to F with 
D, t , r, and e as above, obtaining a partition V = W\ U • • • U Wt with 
3 < t < T such that all but en 2 p edges of F belong to the (e, p)-regular pairs 
{PVi, Wj}. Let F' be the subgraph of F consisting of those edges, and let 
x (*) be the number of pairs (i,j ) such that e F t{W i ,W j ) > \\p{n/tf. For 
clarity of presentation, let us assume that t divides n, and so \Wi\ = n/t for 
each i = 1, . . . , t (cf. Remark 5.1). We will now estimate x from below. 

By (P5) and the definition of e, 
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(A/3)pn 2 < |F'| < 2xQ(n/t) 2 p + (Yf) (^/3)(n/*) 2 
and so 

A < x + (1 -x)A/4 . 
Solving the last inequality for x, we obtain x > A/6. Hence, at least 

(V6)Q) >(A/10)t 2 

pairs (Wi,Wj) are (e, p)-regular, each satisfying 

e F {W u W 3 ) > (Xp/3)(n/t) 2 . 

By simple averaging, there must be an index i and a set J C [£] \ {i } of 
cardinality |J| = At/10 such that (Wi Q , Wj) is as above. Set W = Wi and 
U = {Jj & jWj, and note that \W\ = n/t and 

\U\ = \J\(n/t) = An/10 . 

Let B be the bipartite subgraph of F induced by all edges with one end- 
point in U and the other in W, shortly B = F[U, W]. Then Base{B)[W] C 
Base(F). We will show that Base(.B)[W] is (g, d) -dense, and hence, by 
Lemma 4.4, for n large enough, contains at least Co(n/t) 3 > (c /Tq)?t, 3 trian- 
gles, which will complete the proof. 

To this end, assume for clarity that gn/t is an integer, and pick any 
W C W with \W'\ = q\W\ = gn/t. Our ultimate goal is to show that 



N = \Base(B)[W'] \ > d 




Since for each j G J the pair (Wj, W) is (e, p)-regular with density at 
least 

ii = Xp/3 

and \W'\ > e|W|, by Proposition 4.7, all but at most e\Wj\ vertices of Wj 
have each at least (A/3 — e)p|W|) = (1 — e')7r|W^'| neighbors in W', where 
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e' = 3e/A > e. Consequently, all but at most e\U\ vertices of U have each at 
least (1 — e')7r|W| neighbors in W. 

Let E C G[U, W] be as in Corollary 6.3, i.e. \E Q \ = nlogn and there are 
at most 2(^) ( w ^)p A copies of C 4 in G = G[U, W'\ \ E . Clearly, the same 
is true for the subgraph B = F[U,W] \ E of G . As, say, only at most 
n 2/3 ver ti ces Q f U can be each incident to more than n 1//3 logn edges of £0, 
still, say, all but at most 2e\U\ vertices of U have each at least (1 — 2e')7r| W'\ 
neighbors in W' . Put in another way, for all but at most 2e\U\ vertices u E U 

d u = deg Bo (u) > (l-2e>|W|. 

Let [W] 2 = {ei, enw'\\}- Denote by Xi the number of vertices u G U 

which are adjacent in Bq to both elements of the pair e^, that is which are the 
tips of the tepees of B over e^. Clearly, N — \{i : Xj > 0}|. For convenience, 
assume that Xi > for i — 1, N. Observe that for sufficiently large n, 
by the choice of e, 

f> = E ( > l {1 ~ °( n-1/2 ))( 1 " ^T\u\* 2 \wT > ^\u\\wf , 

and that J2iLi (2O ec L ua l s the number of copies of C 4 in B . Consequently, 



Tl\1 



i=l 

By the Cauchy-Schwarz inequality, 

JV 



i=l 



Consider two cases: 
I. N > J2 x i/ 2 - Then 



iV > - f/7T 2 \W '\ 2 > - X X — — ' ' > d ' 

~ 6 1 1 1 1 ~ 3 10 9V2/-V2 



as required. 



SS 



II. N < Then, by (34), 



N > (E^) 2 > (i/9)Au\ 2 \wf > (\w 



4 E R) ~ 2 P 4 \u\ 2 \w\ 2 - V 2 

as needed. 



□ 



7 Summary, Further Remarks, Glossary 

When embarking on the journey of solving this problem we did not realize the 
range of techniques which would be required, nor did we predict the extent 
of technical difficulties involved. Already one paper, [9] , has sprung as a side 
effect, solving a problem which arose during our work, and addressing themes 
related to it. We hope in the future to follow the various trails that we have 
encountered here. 

There are several directions in which the results of this paper may be 
generalized or extended. We state below the corresponding open problems. 
It seems that in order to solve some of them one will need to develop a 
better understanding of sparse regular graphs and "online" Ramsey theory 
as described in [9]. 

The natural questions that come to mind are with regard to generaliz- 
ing the main theorem of this paper to the case of coloring with more than 
two colors, or the case of Ramsey properties where the defining forbidden 
monochromatic graph is not a triangle but some other graph. We conjecture 
that an analog of our main theorem in this paper holds for all these cases, that 
is, that for all such natural Ramsey properties there exists a sharp threshold. 
These problems seem to be within the grasp of the techniques of this paper 
but present some serious technical difficulties. It is our hope that we will be 
able to overcome these difficulties in future work. 

It is also of interest to study thresholds of Ramsey properties of random 
sets of integers, such as having a monochromatic arithmetic progression in 
every bi-coloring of the set (see[31, 21, 32]). 

A different question which seems at the moment quite hard is whether 
one can improve the result in this paper by establishing the exact threshold 
probability for the Ramsey property, or even proving that it exists: 
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Question 7.1 Does the function c(n) defined in Theorem 1.1 tend to a limit 
as n tends to infinity? If so what is the exact value of the constant it tends 
to? 

It is worthwhile noting that this natural question has not been answered in 
any of the cases where the existence of a sharp threshold has been established 
using the techniques from [8] (i.e. in [8], [1] and [10]). 
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7.2 Glossary 

Any fan of 19'th century Russian literature is familiar with the frustration 
of reading a novel with a huge number of characters, each having two or 
three names and nicknames. The timid western reader often finds comfort 
in a short list added by the editors to the translation, a list of all characters, 
each accompanied by a reminder of their family background. We present 
here such a list that may help the reader of this paper, who may refer back 
to it while reading. We also define here the appropriate choices to ensure 
the needed relations between various constants. We end the glossary with a 
diagram which gives the dependency between the constants. 
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Constants 



c, C are absolute constants given by Theorem 1.2; we have c = C2 = 1/e 
and C = C2 = 10 4 ; their role is to bound the threshold probability p(n) 
scaled by y/n (see Theorem 2.1); 

£ > and < a < 1 are arbitrary constants at the outset of Theo- 
rem 2.1; 

v is the number of vertices of M, an arbitrary balanced graph with 
average degree 4 at the outset of Theorem 2.1; 

q = \10C 2 u/a] is an upper bound on the number of edges in the star 
forest S, a prototype of constellations; used to define Properties (P6) 
and (P7) of Q; 

a\ - no such thing; 

«o = — — r- is an intermediate constant in Section 3; 

a-i = — -7 — r - there are at least a-m v special constellations in 

2(i/ + 

each G G Q\ 

(u + qY + " a 

«4 = — «3 = ^ n . — is an intermediate constant m 

3C 2 i 6C 2 4(2v + q y 

Section 5; 

D is a parameter bounding from above the p-density of patches (large 
subgraphs) of G (see Definition 4.5) which is a crucial assumption in 
all sparse regularity lemmas; in our application of these lemmas, owing 
to Property (P5) of Q, we take D = 2; 

D* is a parameter bounding from above the (S, p*)-density of S'-patches 
of F (see Definition 4.6) which is a crucial assumption in the Subgraph 
Regularity Lemma 4.13; in our application of this lemma we take D* = 
2 q + 1 (see Lemma 4.17); 

A = ( ^ -J - every core has size at least \pn 2 ; 



160 \D*q(is + q) 
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a = a(A) is a constant determined by Lemma 2.3; both, A and a are 
used to define Property (P7) of Q\ 

a 2 £ 6 c 6 

t = is a value of r with which we apply Theorem 2.1; 

^{a^c 6 + 24 t) C t) ) 

r > is an arbitrary constant at the outset of Theorem 2.1; 

T 

7] > is chosen so that ?7log(l/?7) < ^ ; this parameter serves to 

define cores and the above bound guarantees that there are few of 
them; 

S = mm i.0Ar] q , (^^j j"' is an input of the Subgraph Regularity 

Lemma 4.13; the final partition Hi resulting from Lemma 4.13 with H 
being any star forest with at most q edges, is guaranteed to be 5-regular; 

e{£) = mm { ( — — - f" 4 ~ - \ " , q~\ 1 , is a func- 

tion of variable £, at the outset of the Subgraph Regularity Lemma 
4.13; 

T ,L are output constants of the Subgraph Regularity Lemma 4.13; 

t, I are not really constants but vary with G; the final partition IIi 
resulting from the Subgraph Regularity Lemma 4.13 with H being any 
star forest with at most q edges, is a (t, Z)-partition for some t < T 
and / < L ] do not confuse I with I - the argument of the function e(£) 
defined above. 

e = e(l) - the final partition II! resulting from the Subgraph Regularity 
Lemma 4.13 with H being any star forest with at most q edges, is 
guaranteed to be (e, p)-uniform; also, this e satisfies the assumptions 
of Proposition 4.8 with D = 2 and k = q; 

n is a lower bound on n, different at different places in the paper, in 
particular it is an output constant of the Subgraph Regularity Lemma 
4.13; 
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• n\ is the lower bound on n promised in Theorem 2.1; it is the maximum 
of all values of no encountered throughout our proof, most notably the 
n of Lemma 4.13, as well as of several implicit lower bounds on n 
hidden in our calculations; 

Graphs 

\E\ 

• p{H) = ■— — is half of the average degree in a graph H = (V, E). 

• 72. is the family of all graphs such that every bi-coloring of their edges 
results in a monochromatic triangle; they are referred to as Ramsey 
graphs; 

• Q is a family of graphs which depends on c,C,a,u,p = p(n) and is 
defined in Definition 6.1; it is crucial for our proof that G(n,p) G Q 
almost surely (see Theorem 2.1 and Lemma 6.1); 

• T(A, a) is a family of graphs G = (V, E) such that for any subgraph F 
of G with at least X\E\ edges, the set Base(F) contains at least a| V^| 3 
triangles (for the definition of Base(F) see pages 15 or 82); we have 
Q C T(A, a) with A as above; 

• M is an arbitrary, balanced graph with v vertices and 2v edges, at the 
outset of Theorem 2.1; we put V(M) = {x\, . . . , x„}; 

• G = (V, E) is an arbitrary member of Q with more than uq vertices; it 
is assumed that G ^ 71, but G satisfies (3); the whole proof boils down 
to showing that it also satisfies (4); 

• M x is an ordered copy of M with vertex set X = {v\, . . . , v v } C V and 
such that x a is mapped onto v a , for each a = 1, . . . , v\ 

• M* is a random copy Mx, where X is chosen uniformly over all (n)„ 
possibilities; 

• M' x is Mx with edges colored by a fixed coloring a'; 

• T(X) is the set of all tepees in G formed over all pairs of vertices from 
X which are edges of Mx (see Definition 3.1 for the definition of a 
tepee); 
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X is the subgraph of G formed by the edges of T(X); 

M is a common isomorphic type of all Mx with X G X 2 ; it has v + 
vertices and 20 edges, 1 < < g, that is, it consists of tepees over M; 

S(X), for X G ^3, called a special constellation, is a subgraph of X 
consisting of the left legs of all tepees from T(X) (see Section 3.2); 

S is the set of all special constellations; 

S is the common isomorphism type of all S(X) with X G X3; S is a 
forest of stars; 

is the number of edges of the star forest S; the stars forming S have 
0i, . . . , (f) v edges, where = 0i + ■ ■ ■ + y ; 

7T = {Vi, . . . , V v , Wn, Wifa,. . . , W V) i, W^}, is an initial equipar- 
tition of V; for clarity, we further assume that all the sets are of equal 
size m = njiv + 0); 

C is the set of all constellations, that is copies of S which are consistent 
with 7To; 

F a b = G[V a , W a b], a — 1, . . . , u, b — 1, . . . , a , is the bipartite subgraph 
of G consisting of all edges with one endpoint in V a and the other in 
W ab] 

F = [j ab F a b is the subgraph of G consisting of all edges connecting 
sets V a with W a b', in Section 4, F also stands for an arbitrary member 
of J-(H, m); 

H always stands for an arbitrary fixed graph with h vertices, mostly in 
Section 4, where the general form of the Subgraph Regularity Lemma 
4.13 is proved; it is later applied with H = S; 

J-(H, m) denotes the family of all spanning subgraphs of if m , where 
H m is the /i-partite graph obtained by replacing each vertex x of H 
by a set V x of m vertices, and by replacing every edge xy of H by the 
complete bipartite graph K m<m spanned between V x and V y ; 
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• #(H, F) is our notation for the number of copies of H in F, sometimes 
it only counts copies which are consistent with a specified partition 
of V(F); 

• Ilo is an initial partition of F consisting of ttq and the (trivial partitions 
of) subgraphs F ab ; 

• III is a final partition of F resulting from the Subgraph Regularity 
Lemma 4.13; it consists of vertex equipartitions V a = V} U • • • U V* and 
W a b = Wl b U • • • U W* b (under a simplifying assumption, all these sets 
are of equal size m/t), and subgraph partitions F % ab = G[V^,W^ b ] = 
\J k F$>"; note that F = [j aM , k F$' h ; 

• II stands for an arbitrary (or current) (t, /)-partition (see Definition 
4.8); 

• ^ stands for a refinement of II (see Definition 4.8); 

• P always stands for a polyad - a special kind of a subgraph of F, 
always related to (consistent with) a partition II (see Definition 4.9); 
in Section 5 all polyads are consistent with the final partition 111; 

• Vn = V is the set of all polyads consistent with a partition II. 

• R, generally, is a subgraph of F, often a subgraph of a given polyad 
P; in the proof of Lemma 4.13, R C P represents polyads which are 
consistent with a partition \I/ which refines II; 

Quantities depending on n 

• n > n is the number of vertices of G; 

• m = n/(y + <j)) is the number of vertices in each part of the partition 
7r of the vertices of G (we assume this is an integer); 

• p = p(n) is a sequence of edge probabilities satisfying c < py/n < C; it 
is also used for an abstract scaling factor in the formula for p-density 
(see Section 4); 

• ec{U, W) is the number of edges of a graph G with one endpoint in U 
and the other in W; here U and W are disjoint subgraphs of V; 
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d{B) = d(U, V) = j—^-. - is the density of the pair (U, V); also called 
the density of the bipartite graph B, where B = G[U, V}; 

e G (U,V) 
p\U\\V\ 



dp(B) = dp(U, V) = j^rrr is, for < p < 1, the ^density of the 



pair (17, V); also called the p-density of the bipartite graph B, where 
B = G[U,V]; 

p* is an abstract normalizing factor for the (if,p*)-density dR of sub- 
graphs R of F] for our application with H = S we take p* = p^\ 

k — \C\ is the total number of constellations (consistent with 7r copies 
of S) in F] we have k ~ m u+ ^p^] 

cr = \C(R)\, where C(R) is the set of constellations in R; 

Sfj ; = |5(i2)|, where S(R) is the set of special constellations in R; 

da = is the (normalized) (H, p*)-density of R; for our application 

p*c R 

with H = S we take p* = p^\ 



The flowchart of constants 

For the convenience of the reader we have included, in the following a 
chart indicating the interdependencies of various constants that we have used 
in our proofs. 
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